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Family connections 


A high-profile arrest in California shows how the long arm of the law can now extend into 
DNA databases to check for relatives of suspected criminals. 


California has highlighted how DNA samples that have been 

volunteered for one purpose — in this case, genealogy — can 
be used for other reasons, often without the donor’s explicit consent. 
Several ethicists have expressed concern about US detectives using 
a genealogy website in this way. Coming so soon after the reuse of 
Facebook data in political campaigns in the Cambridge Analytica 
scandal, it’s another example of how new technology and techniques 
lead to unexpected conundrums, and how ethical and societal debate 
must catch up. 

The case of the Golden State Killer, linked to at least 50 rapes and 
12 murders between 1976 and 1986, had gone cold — although inves- 
tigators believed they had a reliable sequence of the perpetrator’s DNA. 
Next they needed a match. So, according to reports, they uploaded the 
data to a popular website that compares people's genetic information 
to trace their relatives — in effect, creating a profile for him. They got 
lucky: a match with family members led them to identify and arrest 
Joseph James DeAngelo. 

Just like the Cambridge Analytica case, this one raises the question 
of how much control people have over information they give to public 
or commercial databases. DeAngelo’ relatives submitted their DNA 
for the specific purpose of genealogy, which by definition requires 
the information to be shared and compared. Then they saw it used for 
something else without their consent. In discussions of the case, users 
of genealogy services are divided between those who say the police 
were justified, given the seriousness of the crimes, and those who were 
shocked by the move. 

Such users have received other surprises. Thousands of people have 
discovered through genetic analyses that their parents were not who 
they thought they were. Others have found and been reunited with 
siblings they never knew existed. Such discoveries have implications 
for users’ wider family members, most of whom wont have put their 
DNA in such a database. 

In the California case, the involvement of the police adds an extra 
dimension. People who choose to upload their DNA could unknow- 
ingly be helping police to trace a relative — now and in the future. 

Investigators have long coveted the genetic information held in 
others’ databases. After the Swedish politician Anna Lindh was assas- 
sinated in 2003, Swedish police asked for access to a suspect’s DNA 
stored in a biobank, so that they could compare it with DNA found 
at the crime scene. Their access was granted. But other requests have 
been turned down by courts. In 2006, the Norwegian Supreme Court 
said that police investigating a suspected armed robber, who had 
died six months after the crime, could not access his genetic informa- 
tion held by a hospital. The dead can't be libelled, but they can have 
their privacy invaded. And scientists in Belgium wrestled with these 
issues in 2016, when they confirmed the location of the 1934 death of 
King Albert I from blood samples collected there. They decided not to 
publish sequence details because of possible implications (including 


| ast week’s arrest of a suspect in the Golden State Killer case in 


paternity and health) for his surviving descendants, including 
members of the current Belgian and British royal families. 

To what extent can scientists and companies who collect such 
information anticipate future uses and make them clear to participants 
and customers? There are no easy answers, because many of those uses 
cannot be anticipated at the time. Still, a 2016 survey showed that online 

firms that collate and compare DNA for con- 


“Ethics should sumers are too vague about how it might be 
lead the used (E. Niemiec and H. C. Howard Appl. 
conversation, Transl. Genom. 8, 23-30; 2016). On the basis 
rather than of what they do know, many organizations 
playing should take steps to inform people better. 
catch-up. a (In the California case, users of the site, 


GEDmatch, were told that “other uses” were 
possible and that they should remove their data if this was unacceptable.) 
If police can use genetic databases to catch killers — even those 
who are distant relatives of individuals who have submitted their 
DNA — then perhaps more people will sign up to share their DNA. 
But they should be told that this is a possibility, and be given the choice 
to opt out. Meanwhile, more geneticists, ethicists and lawyers need 
to debate other potential ways in which genetic information is likely 
to be used, so that ethics leads the conversation, rather than playing 
catch-up. m 


All that ails you 


To improve health care, researchers need to 
study diseases as they occur: in combination. 


hen public-health researcher Tolullah Oni travelled from 
WW evnion to South Africa to study HIV, she soon realized 

she would have to broaden her focus. Physicians there were 
grappling with twin epidemics — HIV and tuberculosis. The infec- 
tions often coincide, and so clinicians were working to integrate their 
treatment of the two diseases. 

But Oni found that many of her patients were dealing with a third 
problem. “We started seeing people who came in with good adherence 
to their medicines, but somehow someone had missed the fact that 
their blood pressure was through the roof? she says. To bring them 
back to health, she would need to treat non-communicable diseases 
such as high blood pressure and diabetes as well. “We were treating 
conditions and not people.” Oni went on to study the phenomenon 
in her patient community (‘T. Oni et al. BMC Infect. Dis. 15, 20; 2015) 
and is hoping to take the lessons learnt from integrating care of HIV 
and tuberculosis and apply them to other combinations of diseases. 
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People are complicated, and their medical problems rarely come 
neatly packaged as the single diseases that scientists and doctors study. 
A report released on 19 April by the UK Academy of Medical Sciences 
(see go.nature.com/2jhmcvf) details the challenges of studying and 
treating individuals who have multiple medical conditions, known as 
multimorbidity. Variations in the definition and frequency of multi- 
morbidity across populations have led to wide estimates of its preva- 
lence, ranging from 13% to 95% of patients globally. The report offers 
a list of recommendations on what health-care providers can do to 
address the problem of multimorbidity, and identifies the knowledge 
gaps that need to be filled. 

Researchers should take heed: if their work is to translate to the real 
world, more scientists — at the clinic and the bench — should shift 
their focus to look at interactions between disorders. 

Multimorbidity seems to be growing in countries where the popula- 
tion is ageing and thus more people are living with chronic diseases, 
and in countries grappling with chronic infectious diseases such as 
HIV. Health-care providers should look again at how doctors tend to 
specialize in specific disorders, when it might be better to arm them 
with the ability to recognize and treat a range of conditions. 

Clinical trials have historically focused on single diseases. They 
often exclude participants with other conditions to boost the chance 
of getting a cleaner data set (and to reduce risks of unintended harm). 
But this is beginning to change as part of a push to lower eligibility 
requirements for many clinical trials. Researchers are also increasingly 
focusing on supplementing data from carefully controlled clinical tri- 
als with ‘real-world evidence’ — much messier data collected from 
people who may be taking multiple medications and dealing with mul- 
tiple conditions. Such studies are a good way to start understanding 
the effects of multimorbidity. In this issue, a World View describes how 
to make sure people with anxiety disorder and other complications 


are integrated into clinical research of pain treatments (see page 7). 
There is more to be done. As the report highlights, clinical research- 
ers need to characterize multimorbidity around the world, looking at 
which conditions are most likely to coincide and in which populations. 
Already, evidence shows that this varies dramatically by location and 
wealth. More-deprived individuals in wealthy countries, for example, 
might be more likely to have multiple chronic diseases; whereas in 
poorer countries, wealthier individuals might 


“Multimorbidity be more likely to have multiple conditions. 

seems to be Such studies could identify the most preva- 
growing in lent and harmful clusters of disease — and so 
countries where _ help to focus basic research. Bench scientists 
the population also tend to focus on one disease at a time, 


even if their work sometimes yields insights 
into a range of conditions. More effort should 
be put into studying complex combinations of disorders and how they 
— and their treatments — interact. Studies of ageing, for example, are 
detailing the causes of inflammation and its impact on multiple organs 
in the body (M. N. Bouchlaka et al. J. Exp. Med. 210, 2223-2237; 2013). 

This requires support from funders, and a wider recognition that 
the most tractable projects with the cleanest, easiest to interpret results 
might not be the most worthy of funding. Studying diseases in com- 
bination is challenging, but computational and laboratory tools are 
increasingly available to handle complex data sets and tease out mean- 
ing from messy data. 

Some funders are already taking steps in this direction: an upcom- 
ing workshop held by UK charity the Wellcome Trust, the UK Medical 
Research Council and other organizations will look at how research 
can better tackle multimorbidity. This movement needs support in the 
coming years. Awareness of multimorbidity has been growing steadily: 
now the question is how best to deal with it. m 


is ageing.” 


Human embryo and 
stem-cell research 


esearch using human embryos and embryonic stem cells 

draws intense ethical scrutiny and places demands on scien- 
tists, funders and journals to follow the relevant regulations. As a 
publisher of such work, Nature and the Nature journals take this 
responsibility very seriously. For many years, Nature journal editors 
handling manuscripts on human embryo and stem-cell research 
have assessed the ethical oversight of the work when deciding 
whether to publish it. We are now formalizing and amending 
aspects of this publication policy. 

Nature journals encourage stem-cell scientists to embrace guide- 
lines agreed in 2016 by the International Society for Stem Cell 
Research (ISSCR) as they design, execute and report their research. 
These ‘Guidelines for stem cell research and clinical translation’ 
describe rigorous standards for stem-cell research consistent with 
international policies that govern biomedical science and clinical 
trials. To encourage scientists to follow these guidelines, we have 
identified categories of manuscripts for which we will require authors 
to send an accompanying ethics statement or will consult an ethicist 
reviewer. 

Under this policy, Nature journals will require an ethics statement 
from the authors for papers that involve human embryos or gametes, 
and for clinical studies of cells derived from pluripotent stem cells. 
This statement must highlight ethical oversight of the work, includ- 
ing the review boards specialized in embryo research that approved 
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it, and details of the consent process for cell donors and recipients. 

For manuscripts that we consider especially sensitive, Nature 
journals will request assessment by an independent ethicist along- 
side scientific peer review. Such manuscripts will include, but will 
not be limited to, those reporting genome engineering of human 
embryos or clinical work with gametes or cells derived from pluri- 
potent stem cells. These ethicist reviewers may provide guidance 
on formulating the ethics statement to ensure accurate and trans- 
parent reporting of approval conditions. Authors may be asked to 
submit redacted informed-consent documents and review-board 
documents for evaluation by the ethicist reviewer. 

Independent ethics review will also be required for manuscripts 
reporting work in which intact human embryos or embryo-like 
structures are kept alive for close to 14 days, a time point that corre- 
sponds to the formation of the primitive streak and the acquisition 
of organismal potential. 

At present, many countries — and the ISSCR guide- 
lines — prohibit culture beyond 14 days, a restriction that reflects 
the conclusions of the 1984 UK Report of the Committee of Inquiry 
into Human Fertilisation and Embryology (also known as the 
Warnock report). Whether this rule should be relaxed is currently 
being debated, triggered in part by technological advances that 
enable scientists to reconstruct human embryo-like structures 
from stem cells. 

As this and other debates unfold, we anticipate the need to revisit 
some aspects of our policy in accordance with shifts in best practices 
for the stem-cell field, driven by advances in science and technol- 
ogy and evolving social norms. Nature fully supports an inclusive 
approach to such discussions, involving broad consultation and 
dialogue. We hope that our policy complements these efforts by 
scientists, ethicists, regulators, policymakers and funding agencies. = 
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WORLD VIEW 


i) 


launched a multi-agency effort to combat the country’s opioid- 
addiction crisis. Funds for research into controlling opioid 
misuse and treating pain will nearly double in 2018, to US$1.1 billion. 

The forces behind this epidemic extend beyond overprescription: 
most of the tens of thousands of deaths caused by opioid overdose in 
the United States each year result from illicit use. Still, an inadequate 
understanding about how to treat pain has certainly contributed. We 
need to characterize patients better, and we need more studies that 
incorporate non-drug treatments alongside any form of medication. 

Consider this crucial question: what is the first treatment you should 
give a person for chronic pain, or even many acute injuries? Most clini- 
cians now agree that the answer should not be opioids. Fewer recognize 
that the question is not which pill to use instead, but what system of 
interventions — including medication — and monitoring to implement. 

Too often, pain is treated as a purely biomedi- 
cal problem. It is a biopsychosocial condition. 
Psychological treatment can be combined with 
medication to equip people with the tools to better 
control their pain experience. Psychological thera- 
pies can also lower risks such as addiction, because 
the emphasis is on engaging patients in managing 
their daily actions to help themselves to feel better 
in the long run, rather than relying solely on pas- 
sive medications. Yet a common clinical practice 
is to recommend such psychosocial strategies for 
pain only after all medications have failed. 

Itis hard for clinicians to learn which treatments 
to use, because our research system shuns the very 
patients we need to understand. Pain-research tri- 
als often exclude adults who have depression, anx- 
iety and other disorders, those who take other prescription medications 
and those over the age of 70, who tend to have multiple co-morbities. 

To treat pain better, we should attend to these complex patients, 
rather than exclude them. One effort to do so is the Collaborative Health 
Outcomes Information Registry, or CHOIR (http://choir.stanford. 
edu), which my colleagues developed with NIH support. The platform 
collects data on the patients we personally see in our pain clinic every 
day: their age and sex; how they are sleeping; how pain affects their daily 
routine; their mobility, strength and endurance; how they engage with 
friends and family; which other medications they take; what other diag- 
noses they have. It also tracks treatments and responses over time. Clini- 
cians can easily follow their own patients’ progress, and the system can 
be programmed to recommend tailored treatments or patient education. 

Researchers can look for patterns among groups of patients. For 
example, several studies suggest that most people taking opioids long- 
term do not benefit from them (see go.nature.com/2vylvkp). However, 
almost all clinicians who treat chronic pain observe that some people 
do quite well on opioids. We need to be able to predict who those indi- 
viduals are. Otherwise, we are either going to exclude people from a 


L= month, the US National Institutes of Health (NIH) formally 


TOO OFTEN, 


PAIN IS 


TREATED 


AS A PURELY 


BIOMEDICAL 


PROBLEM. 


A personal take on events 


To treat pain, study people 
in all their complexity 


Clinical research needs to investigate not simply drugs, but the psychology of 
why and how individuals experience pain, says Beth Darnall. 


treatment that benefits them or expose them to a risky medication. 

We know, for instance, that people who worry more about pain, or 
who report feeling helpless in the face of it, are at risk of prolonged pain 
and opioid use after surgery (M. M. Wertli et al. Spine 39, 263-273; 
2014). My colleagues and I are currently assessing whether an online 
education app can help patients to manage their worries, decrease pain 
and limit opioid use after surgery. 

More such pragmatic clinical trials are needed. So are accessible tools, 
such as CHOIR, to implement these trials. We are currently building a 
CHOIR network across the United States, Canada and Israel to integrate 
data and answer questions about which of several commonly used pain 
treatments works best, and in which individuals. Ideally, we will then 
use the results of these trials to inform clinicians continuously about the 
most safe and effective treatment to prescribe for their patients. 

And we need to study how placebo effects could enhance pain 
treatment, by deliberately integrating them into 
clinical trials. 1 am not talking about sugar pills, 
but about a strategy called placebo optimization. 
Simple pain-science education, cognitive regu- 
lation and relaxation skills can help empower 
patients to reduce pain processing in the brain, 
gain better control over their symptoms and 
garner more benefit from medical treatments. 

Patients can actually be primed for relief. For 
instance, placebo optimization could involve 
emphasizing to patients that we have evidence 
suggesting that various treatment plans — such 
as gently tapering opioid dosing — can be done 
without increased pain. Clinicians also need 
strategies for detecting and minimizing ‘nocebo 
effects: in this case, negative expectations and fears 
about pain that can undermine the effectiveness of medical treatment. 

We need to incorporate psychology and complexities into clinical 
trials and medical care. More funding for treating opioid addiction 
and misuse is welcome. But essential, too, are funds for investigating 
pain as a condition in itself. 

In 2016, the Institute of Medicine estimated that up to one-third of 
the US population lives with ongoing pain. Chronic pain, the main 
cause of disability, is more prevalent than diabetes or heart disease. It 
costs the US economy up to $630 billion every year in health care and 
lost productivity, and lowers the quality of too many lives. Although 
precise numbers are hard to come by, NIH spending breakdowns show 
that the agency committed just over $500 million in 2017 to broad 
pain research. Finding better ways of treating pain is surely worth a 
greater investment. m 


Beth Darnall is a clinical professor in the Department of 
Anesthesiology, Perioperative and Pain Medicine at Stanford 
University in California. 

Twitter: @bethdarnall 
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Al boost for Europe 


The European Commission 
will increase its spending on 
artificial intelligence (AI) to 
€500 million (US$604 million) 
per year for three years starting 
in 2018, up from about 

€300 million the previous 
year. The cash, announced on 
25 April, is part of an initiative 
designed to boost Europe's 
standing in the field. Other 
plans include creating ethical 
guidelines for AI development 
and proposing legislation to 
increase the amount of publicly 
available data. Separately, a 
group of prominent European 
Al researchers signed an open 
letter on 24 April warning that 
the continent's AI laboratories, 
investments and companies 
are not keeping pace with 
rivals in North America and 
China. The statement calls 

on European governments to 
create an Al institute with sites 
in several countries, similar 

in scope to the European 
Molecular Biology Laboratory. 
Each location should have an 
initial investment of around 
€100 million, the letter says. 


Facebook data 
Facebook's tightening of 
third-party access to user data 
in the wake of the Cambridge 
Analytica controversy risks 
hampering research, a group 
of academics has argued. In 
an open letter published on 

25 April, prominent data and 
Internet scientists said that 
restrictions on how third 
parties access social-media 
data were likely to diminish 
transparency and independent 
oversight of such platforms. 
They welcomed a Facebook 
initiative, announced on 

9 April, to encourage peer- 
reviewed research on the role 
of social media in elections 
and democracy, but they said 
that the proposal’s narrow 
terms of reference and use ofa 
hand-picked panel of scholars 


The news in brief 


Nations join to watch for glacier collapse 


UKand US polar scientists are launching a 
£20-million (US$27-million) effort to probe the 
Thwaites glacier in Antarctica. The five-year 
project, announced on 30 April, is the biggest 
joint Antarctic effort by the nations in more 
than seven decades. The programme will fund 
eight studies and is set to begin this October. 
Researchers will gather radar, seismic and other 


to define the research agenda 
mean that it risks failing to 
support independent research. 


Suspected killer 
Law-enforcement officials 

in California used DNA data 
found on a genealogy website 
to track down a suspected 
serial rapist and murderer, 
raising widespread concerns 
about genetic privacy. The 
Sacramento Bee newspaper 
broke the story on 26 April, 
reporting that investigators 
had used the free database, 
called GEDmatch, to find 
relatives of the suspected 
‘Golden State Killer, whose 
alleged crimes date back to 
the 1970s. Ina statement, 
GEDmatch said that it had not 
been consulted about this use 
of its data, and noted that it 
had always warned users that 
the database could be used for 
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other purposes. The suspect, 
who is 72, has so far been 
charged with 8 murders. See 
page 5 for more. 


POLICY 


Letter to Trump 
Nearly 700 members of the 

US National Academy of 
Sciences (NAS) have signed a 
public letter that denounces the 
administration of US President 
Donald Trump for its hostility 
to science. The statement, 
published on 23 April, 
admonishes Trump for 
withdrawing the United States 
from the Paris climate accord, 
and warns of the consequences 
of disregarding scientific 
evidence. The letter encourages 
the administration to maintain 
scientific content on publicly 
available websites, to appoint 
qualified people to posts 


data on the glacier to understand whether it 

is headed for collapse. The glacier’s drainage 
basin covers an area roughly the size of Britain 

on the West Antarctic Ice Sheet, and it already 
accounts for about 4% of global sea-level rise. The 
programme is being funded by the UK Natural 
Environment Research Council and the US 
National Science Foundation. 


requiring scientific knowledge, 
to stop intimidating 
government researchers and 

to rejoin the Paris Agreement. 
The members signed as 
individuals, and not on behalf 
of the NAS. 


Insecticide ban 


The European Union has 
voted to ban the use of certain 
controversial neonicotinoid 
insecticides on all outdoor 
crops. The vote, which took 
place on 27 April, ends years of 
bitter wrangling between those 
supporting a ban, including 
environmentalists and many 
scientists, and opponents. An 
influential scientific review 
concluded in February that the 
insecticides pose a high risk 

to wild bees and honeybees. 
The three neonicotinoids 

of greatest concern for bee 
health — clothianidin, 
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imidacloprid and 
thiamethoxam — will not be 
allowed to be used outdoors, 
but can be used in permanent 
greenhouses. The ban is 
binding for all EU member 
states, and it will enter into 
force by the end of 2018. 


Mars-rock return 
NASA and the European Space 
Agency (ESA) are considering 
a joint mission to bring soil 
samples from Mars to Earth, 

a statement announced on 

26 April. The venture, which 
would help to illuminate 
Mars’s potential to harbour 
life, is no small feat. It would 
require both agencies’ future 
Mars rovers, which are set to 
land on the red planet in 2021, 
to collect soil samples from 
the Martian surface and just 
beneath. A third rover would 
pick up the samples and place 
them in a rocket to be launched 
into a Martian orbit, where 

it would rendezvous with a 
spacecraft that would fetch 
the specimens and bring them 
to Earth. Plans for the rover 
and spacecraft are yet to be 
approved. 


NASA has apparently cancelled 
a future lunar-rover mission, 
despite a directive from 

US President Donald Trump 

to focus on returning to 

the Moon. The Resource 
Prospector spacecraft was 


TREND WATCH 


Scientists have reason to hope that 


North Korea will soon open up to 
more collaborations after historic 
peace talks last week with South 
Korea. North Korea publishes 
little research, but its output is 
growing. Its scientists published 
about 80 articles in international 
journals last year, more than 4 


times their 2014 output, according 


to the Web of Science database. 
Some 60% of North Korean 
papers since 2015 name Chinese 
co-authors. Main topic areas 
include geosciences, engineering 
and materials science. 


scheduled to launch in 2022 

to mine substances such as 
hydrogen and water. Results 
from that mission would have 
been used to inform human 
exploration of the Solar 
System. On 27 April, NASA 
head Jim Bridenstine tweeted 
that Resource Prospector’s 
instruments would still be 
used, presumably separately 
from the rover, in missions 

to the surface of the Moon. 
One influential group of 

lunar scientists, the Lunar 
Exploration Analysis Group, 
criticized the move and argued 
that the rover should be a key 
component of NASA’s renewed 
focus on lunar landings. 


Nuclear site 


North Koreas mountain 
nuclear test site (pictured) 
partially collapsed after the 
most recent nuclear detonation 
in September 2017, say Chinese 
researchers. A 4.1-magnitude 
earthquake occurred 8.5 
minutes after the blast at 
Punggye-ri in the country’s 
north, followed by several 
smaller earthquakes 20 days 
later; the authors say these 
tremors indicate rock falling 
in from above the blast cavity 
and note that the mountain 
should be monitored for 
potential radioactive leakage. 
The findings were published 
on 27 April (D. Tian et al. 
Geophys. Res. Lett. http://doi. 
org/cn3t; 2018), the same 


NORTH KOREA'S SCIENCE 


day that North Korean leader 
Kim Jong-un met South Koreas 
President Moon Jae-in and 
pledged to close the nuclear test 
site in May. 


Record wait 

The US National Oceanic and 
Atmospheric Administration 
(NOAA) has been without 

a permanent administrator 
for more than 15 months 
—arecord for the agency. 

US President Donald Trump 
nominated AccuWeather 
chief executive Barry 

Myers to head NOAA in 
October 2017, pending Myers’s 
confirmation by the Senate. 
But the Senate vote has been 
held up over concerns about 
potential conflicts of interest. 
AccuWeather uses NOAA data 
to provide a host of weather- 
related services, and is owned. 
and operated by Myers and 
his two brothers, Joel and 
Evan. Myers has said he will 
step down and divest himself 
of interests in the company 


The isolated nation publishes just tens of articles in international journals 
each year. Its researchers’ main collaborators are in neighbouring China. 


= Total 
80 --- = Collaboration with Chinese researcher(s) 
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Publications indexed in Web of Science 
with a North Korean author 
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ifhe is confirmed, but critics 
are sceptical that he can 
disentangle himself from the 
business. 


CRISPR arguments 


A US appeals court heard oral 
arguments in the ongoing 
dispute over rights to key 
CRISPR-Cas9 genome- 
editing patents on 30 April. 
The University of California 
and its collaborators are 
appealing against a 2017 
decision by the US Patent 

and Trademark Office to 
recognize a competing patent 
filed by a group led by the 
Broad Institute in Cambridge, 
Massachusetts. In the appeal, 
the California team argued 
that the patent office erred 

in deciding that the Broad’s 
CRISPR patent represented a 
significant invention beyond 
that covered by the University 
of California's patent. If either 
party is not satisfied by the 
appeals court's decision, 
expected later this year, they 
could then appeal to the 

US Supreme Court. 


Reef rescue 


Australia’s government will 
spend around Aus$500 million 
(US$377 million) to help the 
beleaguered Great Barrier 
Reef, it said on 29 April. 
Aus$444 million will go to the 
Great Barrier Reef Foundation 
to tackle threats such as water 
pollution and invasion by 
crown-of-thorns starfish, and 
to support restoration efforts. 
Another Aus$56 million will 
be given to the Great Barrier 
Reef Marine Park Authority 

to expand its management of 
the reef. Critics pointed out 
that the funding ignores the 
reef’s biggest threat — climate 
change. A study last month 
found that global warming 
was a factor in the 2016 coral- 
bleaching events that damaged 
around one-third of the reef’s 
corals (‘T. P. Hughes et al. 
Nature 556, 492-496; 2018). 
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An artist’s impression of NASA’s Mars InSight lander, which will touch down in November and listen for seismic tremors. 


PLANETARY SCIENCE 


Mars probe to hunt quakes 


NASA’s InSight mission will listen for seismic activity to uncover details of the red planet’s 


mysterious core. 


BY ALEXANDRA WITZE 


planetary stethoscope will soon be on 
At way to listen to Mars’s heartbeat. 
On 5 May, NASA plans to launch 
its US$994-million InSight spacecraft from 
Vandenberg Air Force Base in California. The 
mission’s main job will be to place a seismom- 
eter on the Martian surface and listen to seis- 
mic waves pinging around the planet's interior. 
If the effort succeeds, it will mark the first 
unequivocal detection of tremors known as 
marsquakes — and explain long-standing 
mysteries about the planet’s inner structure 


and how it evolved. “There are all these ques- 
tions about Mars that can only be answered 
with seismic data,” says Bruce Banerdt, a geo- 
physicist at NASA’ Jet Propulsion Laboratory 
in Pasadena, California, and the mission’s prin- 
cipal investigator. 

“Tt will be the first geophysical observatory 
on Mars,’ adds Ana-Catalina Plesa, a planetary 
geophysicist at the German Aerospace Center 
(DLR) in Berlin. “We are all really excited” 

On Earth, seismologists use measuring 
stations scattered around the world to detect 
seismic waves from distant earthquakes. By 
tracking how that energy bounces around 


the planet’s interior, researchers can calculate 
information such as the size of Earth’s core. 
But no one has yet done this on Mars. NASA 
tried unsuccessfully with its Viking landers, 
which launched in 1975. Viking 1 failed to 
deploy its seismometer. And although the 
Viking 2 instrument gathered about 2,100 
hours of data, all the tremors it detected, with 
one possible exception’, were caused by gusts of 
wind shaking the spacecraft. The seismometer 
had been mounted on top of the lander rather 
than in direct contact with Mars’s surface. 
After InSight lands, it will plop its water- 
melon-sized seismometer — built by the 


3 MAY 2018 | VOL 557 | NATURE | 13 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


| NEWS IN FOCUS 


> French space agency CNES — onto the 
Martian ground. The instrument will nestle 
beneath a protective wind shield as its three 
delicate pendulums measure even tiny trem- 
ors. “It is pretty much the most sensitive seis- 
mometer that’s ever been built,” says Renee 
Weber, a planetary scientist at NASA’s Marshall 
Space Flight Center in Huntsville, Alabama. 


MARTIAN MYSTERY 

The big question is how many marsquakes it 
will capture. With no actual data on Martian 
seismicity, researchers have used maps of geo- 
logical faults on the planet's surface’, along 
with calculations of how its interior has cooled 
over time’, to estimate that Mars probably has 
fewer quakes than Earth but more than the 
Moon. (Tectonic fractures and Earth’s tidal 
pull trigger moonquakes.) 

InSight will land in Elysium Planitia, a 
safe, flat and geologically boring site near 
the Martian equator’. There, it might meas- 
ure one local marsquake each year between 
magnitude 2.7 and 4.2, says Weber. It could 


also detect bigger marsquakes from regions 
much farther away, such as the fault-riddled 
Cerberus Fossae. “Our goal is to collect some- 
thing like 30 quakes over the mission,’ says 
Philippe Lognonné, a geophysicist at the Paris 
Institute of Earth Physics who leads the seis- 
mometer team. 

The bigger the marsquake, the more it will 
reveal about the planet's interior, because only 
the largest seismic events penetrate all the way 
to the core. InSight might see one or two such 
quakes during the two Earth years that NASA 
hopes to operate the mission. 

Marsquake data will help InSight to map 
the boundaries between Mars’s crust, mantle 
and core. That could reveal the depth to which 
the planet’s primordial magma ocean once 
churned, and whether Mars ever had anything 
resembling plate tectonics. Pinning down the 
size of the Martian core, thought to be roughly 
half as big as Earth's, would reveal its density 
and composition’. Mars’ internal layers repre- 
senta record of its first tens of millions of years, 
says Banerdt. And studying its interior might 


help to reveal the early history of Earth, which 
probably went through similar changes soon 
after it formed. 

Meanwhile, a radio-science experiment 
on InSight will measure how Mars wobbles 
on its axis, as a way to further understand the 
core’s size. And a heat-flow probe, built by a 
team at the DLR, will penetrate up to 5 metres 
beneath the surface to measure how tempera- 
ture changes with depth and time. 

The plan is to land InSight on Mars on 
26 November. For Lognonné, who has been try- 
ing to get a seismometer to Mars for more than 
two decades, that day can’t come soon enough. 

“Tl be much more happy when I get the first 
data,” he says. = 
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PUBLIC HEALTH 


Trial helps African children 


Pre-emptive antibiotic treatment reduces deaths in at-risk kids, but raises fears 
about the development of drug resistance. 


BY AMY MAXMEN 


o stem the rise in antibiotic resistance, 
researchers recommend that people take 
these drugs only after they are diagnosed 
with a bacterial infection. But a trial involving 
nearly 200,000 children in Niger, Tanzania and 
Malawi went against that guidance in an attempt 
to save youngsters in regions where as many as 
one in ten die before their fifth birthday. 

The results, published on 25 April in 
The New England Journal of Medicine, suggest 
that widespread distribution of antibiotics could 
prevent thousands of deaths (J. D. Keenan et al. 
N. Engl. J. Med. https://doi.org/10.1056/NEJ- 
Moal715474; 2018). But health officials and 
researchers are wary of starting a massive drug 
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programme on the basis of the results because it 
would drive antibiotic resistance. 

“Tt goes against dogma at the moment because 
everyone else is trying to reduce antibiotic use,” 
says Per Ashorn, who specializes in paediatric 
infectious diseases at the World Health Organi- 
zation (WHO). The study’s results are exciting, 
he says, but the WHO needs more data to evalu- 
ate the approach. 

Some officials sound more enthusiastic about 
the strategy. “As a person who was born in one 
of the poorest countries in the world, I welcome 
this,” says Samba Sow, the health minister in 
Mali, where 11% of children die before the age of 
5. “My older brother died as a child, more than 
one of my cousins died as a child — children die 
here, and they die fast.” 


| MORE NEWS | 
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Mass drug administration fell out of favour 
in the late 1960s, after programmes to prevent 
malaria through the large-scale distribution of 
a drug called chloroquine backfired. Resistance 
developed rapidly because thousands of people 
with insufficient levels of chloroquine in their 
systems became infected with malaria parasites. 
Strains that were susceptible to the drug died, 
but more-resilient ones multiplied. Eventually, 
chloroquine stopped working. 

But opinions on mass drug administration 
seem to be changing. Since 2012, several Afri- 
can countries have reduced deaths from malaria 
by pre-emptively treating millions of children 
during the rainy season. And the WHO now 
recommends the distribution of some antibiot- 
ics to certain populations impacted by neglected 
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tropical diseases — a constellation of 
illnesses affecting roughly one billion people 
living in poverty around the world. 

The idea for the latest study came from an 
analysis of the pre-emptive use of the antibi- 
otic azithromycin in Ethiopian communities 
affected by trachoma — a disease that causes 
blindness — in the late 2000s. Researchers 
noticed a drop in overall deaths (‘T. C. Porco 
etal. J. Am. Med. Assoc. 302, 962-968; 2009), 
and Thomas Lietman, an infectious-disease 
researcher at the University of California, 
San Francisco, and his colleagues followed 
up with the current trial, dubbed MORDOR 
(from the French description of the project). 

As part of MORDOR, children under five 
in communities in Niger, Malawi and Tan- 
zania took one dose of azithromycin twice 
a year for two years. Control populations 
received a placebo. Childhood mortality 
rates among treated communities in Niger 
dropped by 18% compared with control 
populations; Tanzania had 3% fewer deaths 
and Malawi saw a 6% reduction. 

Lietman says that Niger probably expe- 
rienced the greatest benefit because it has 
the highest childhood mortality rate of 
the three countries. About 9% of children 
die before the age of 5 in Niger, compared 
with about 5% in Tanzania and Malawi. 
Pneumonia and diarrhoea triggered by bac- 
terial infections help to drive up childhood 
mortality rates, according to a 2017 report 
from the United Nations. Poor sanitation, 
unsafe drinking water and malnutrition 
combine to make children living in poverty 
especially vulnerable to disease-causing 
microbes. They’re also more likely to die 
from curable conditions because health 
care can be unaffordable or too far away to 
be of help. 

But this antibiotics strategy comes at a 
cost, says Ramanan Laxminarayan, direc- 
tor of the Center for Disease Dynamics, 
Economics and Policy in Washington DC. 
If resistance develops against azithromy- 
cin, diseases treated by the drug, including 
gonorrhoea, would become harder to com- 
bat. He hopes that, if policymakers decide 
to implement this approach, they will 
target only the populations most in need, 
and then just for a limited time. Groups 
supporting this approach should also work 
to reduce childhood mortality in the same 
way that the developed world did, he says, 
through improved sanitation, nutrition and 
health care. 

For now, researchers will continue to 
study the effects of this antibiotics strategy. 
Later this year, similar trials will launch 
in Mali and Burkina Faso. And Lietman’s 
team is evaluating data collected during 
an extension of its trial, to assess how fast 
antibiotic resistance develops. The WHO 
plans to release a statement about whether 
this strategy is justified, and in what 
circumstances, by the end of 2019. = 
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The Owens Valley Long Wavelength Array in California hosts the LEDA experiment. 


ASTRONOMY 


Physicists trawl skies 
for enigmatic signal 


Teams rush to find faint signature from Universe’s first stars. 


BY DAVIDE CASTELVECCHI 


esearchers are heading to some of the 
R most remote spots on Earth — from 

the Tibetan Plateau to an island in the 
sub- Antarctic ocean — to try to capture an enig- 
matic radio signal from the early Universe. This 
grand search includes some of the first experi- 
ments to follow up on a surprise announce- 
ment in February that astronomers had seen 
evidence of the Universe’s first stars lighting up’, 
a moment known as the cosmic dawn. 

And as experimental physicists try to repli- 
cate those findings in the few places on Earth 
that are relatively undisturbed by radio inter- 
ference, theorists are struggling to make sense 
of the data. “The signal does not look like any- 
thing we expected,” says Abraham Loeb, an 
astrophysicist at Harvard University in Cam- 
bridge, Massachusetts. 

The original detection was reported by 
researchers at the Experiment to Detect the 
Global Epoch of Reionization Signature 
(EDGES), which uses a pair of table-sized 
radio antennas in the Australian outback. The 
experiment measures the long-wavelength part 
of the cosmic microwave background, the noisy 
afterglow of the Big Bang. The researchers were 
searching for a subtle dip in the background 
spectrum where the microwave radiation is 


slightly dimmed. Cosmologists have theorized 

that such a dip should have been caused by the 

light of the first stars, which made primordial 

hydrogen in the Universe less transparent at a 

particular radio wavelength. The details of this 

absorption should contain information about 

the early interstellar matter and the stars that 
cast light on it. 

But the blip had an 

unexpected shape. 

It suggested that the 

absorption started 

to ramp up rapidly 

around 150 million years after the Big Bang, 

stayed roughly constant between 200 million 

and 250 million years ago, and then disap- 

peared relatively quickly. The dip was also 

deeper than predicted, which implied that 

the gas was colder than expected during that 

epoch — perhaps 4 kelvin instead of 7 kelvin. 


EXTRA SCRUTINY 

The EDGES team spent two years cross- 
checking their peculiar result before deciding 
to go public. Researchers have posted dozens 
of preprints in response, trying to interpret the 
anomaly. Some physicists have suggested that it 
was a possible sign of previously undiscovered 
interactions between ordinary matter and dark 
matter*. Others saw a possible indication of 
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> the absence of dark matter’. 

The EDGES researchers have now started. 
another round of observations with a new, 
smaller antenna. They have “preliminary 
evidence” that this antenna also sees the 
original feature, says lead scientist Judd 
Bowman, an astronomer at Arizona State 
University in Tempe. 

Competing experiments are also trying 
to reproduce the EDGES result. In April, 
Lincoln Greenhill, a radio astronomer 
at the Harvard-Smithsonian Center for 
Astrophysics in Cambridge, Massachusetts, 
flew to the arid Owens Valley in California 
to test a modified version of the Large- 
Aperture Experiment to Detect the Dark 
Ages (LEDA). The experiment — an array 
of antennas that look like umbrella frames 
— might have just missed the EDGES signal 
because it originally operated with filters that 
cut off frequencies above 82 megahertz. The 
EDGES signal seems to be centred at about 
78 megahertz, so is very near the top of that 
range. The team is testing filters that allow 
detection of higher frequencies. If things 
go well, Greenhill says, it might take a few 
months to collect and analyse enough data. 

Meanwhile, at the Raman Research Insti- 
tute in Bangalore, India, Ravi Subrahman- 
yan and his colleagues have quickly built a 
version of their spherical antenna, called 
SARAS-2, that is sensitive to the range of the 
EDGES signal. They plan to deploy the new 
antenna in May ata site outside town, and to 
later move it to the Tibetan Plateau. 

Places without radio interference are rare 
now, but “we might have the most radio 
quiet place on Earth’; says physicist Jonathan 
Sievers at the University of KwaZulu-Natal in 
Durban, South Africa. The spot is on Marion 
Island, halfway to Antarctica, and the only 
way to get there is ona ship that goes once a 
year, in April. A small KwaZulu-Natal team 
led by physicist Cynthia Chiang installed its 
cosmic-dawn experiment, Probing Radio 
Intensity at High-Z from Marion (PRIZM), 
there last year. Chiang is now at the island 
station again, retrieving data from the past 
year and upgrading their telescope for new 
observations. 

But one of the quietest places for 
radioastronomy in the Solar System would 
be the far side of the Moon. Jack Burns, an 
astrophysicist at the University of Colo- 
rado Boulder, is leading a proposal to put a 
10-metre-long wire antenna ona small lunar 
orbiter. From there, the probe should detect 
not only the EDGES absorption feature, but 
also one from an earlier epoch known as the 
dark ages — before stars existed. The fea- 
ture would appear at around 15 megahertz, 
a band that is not accessible from Earth. = 
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SYNTHETIC BIOLOGY 


Genome-synthesis 
effort shifts focus 


GP-write project to make virus-resistant human cell lines. 


BY ELIE DOLGIN 


bold plan to synthesize an entire human 
Az has been scaled back to a 

more technically attainable near-term 
goal. Instead of synthesizing all of the human 
genome’ 3 billion DNA base pairs, the project 
will now attempt to recode the genome to 
produce cells immune to viral infection. 

Organizers of Genome Project-Write 
(GP-write), a global public-private partnership 
that includes around 200 scientists, announced 
the priority shift at a meeting in Boston, 
Massachusetts, on 1 May. 

But even the downsized ambitions might 
be difficult to achieve soon, because the two- 
year-old effort still has no dedicated funding for 
what's estimated to cost tens, if not hundreds, 
of millions of dollars and last a decade or more. 

“We thought it was important to have a 
community-wide project that people could 
get behind, says project co-leader Jef Boeke, a 
yeast geneticist at New York University. When 
the effort launched in 


2016, the creation ofa “It would be 
virus-resistanthuman greatif we 

cell line was listed as accomplish the 
one of several pilot goals of GP-write 
projects that would entirely with 
develop the technol- pre-existing 

ogy to synthesize the orunlabelled 

full genome’. With funds. id 


the cell line now the 

focus, raising money should be easier, says 
Nancy Kelley, a biotechnology lawyer who is 
co-leading the effort with Boeke and George 
Church, a genome scientist at Harvard Medical 
School in Boston. 

Onlookers generally approve of the priority 
shift. “This is a terrific idea” says Martin Fuss- 
enegger, a synthetic biologist at the Swiss Fed- 
eral Institute of Technology in Zurich. “It's more 
geared toward utilities and applications” — not 
just DNA synthesis for its own sake, he adds. 

A virus-proof human cell line would let firms 
make vaccines, antibodies and other biologi- 
cal drugs without risk of viral contamination. 
It could also help to make protein drugs with 
chemical ornaments similar to those in human 
proteins, to decrease the risk of the body’s 
immune system rejecting them. However, the 
organizers’ main goal is still to improve DNA 
technologies, not to create a particular product. 

“The idea is to develop the technologies to 


do this very quickly and easily using a variety 
of gene-editing and synthesis techniques,’ says 
Harris Wang, a synthetic biologist at Columbia 
University Medical Center in New York City, 
and a member of GP-write's scientific execu- 
tive committee. The “ultra-safe” human-cell- 
line project, Wang adds, has “the right level of 
complexity, difficulty and many different facets 
of design” to push those technologies forward. 

One thing it doesn’t have going for it, how- 
ever, is much dedicated funding. Although 
a gene-editing technology company said it 
would donate technical expertise at the meet- 
ing, no financial backers have stepped forward. 

Church estimates that the consortium has 
more than US$500 million in “related funding” 
— but he includes, for instance, $40 million ear- 
marked for his own work on synthetic-biology 
projects including engineered bacteria and 
miniature organ-like structures. He also counts 
$23.4 million for an international initiative led 
by Boeke to synthesize the yeast genome. Both 
efforts started years before GP-write. 

And the lion’s share of the related funding 
is investment money raised by loosely affili- 
ated biotech companies. Church includes it in 
his estimates not because the firms have given 
money to the effort, but because he is tabulat- 
ing what he calls “a rough-draft market sum- 
mary” of the gene-synthesis “ecosystem”. 

As such, he includes hundreds of millions of 
dollars collectively raised by eGenesis, a start-up 
that he co-founded in Cambridge, Massachu- 
setts; Twist Bioscience in San Francisco, Cali- 
fornia, of which he is a shareholder; and Ginkgo 
Bioworks, a Boston synthetic-biology company 
that last year acquired another Church-backed 
venture, Gen9. And although leaders of eGen- 
esis and Twist have been active in GP-write, 
Ginkgo senior management has not. “We're not 
involved in GP-write at all, and I’m surprised to 
see that they included us on that list of funding,” 
says creative director Christina Agapakis. 

Church defends his accounting. “It would 
be great if we accomplish the goals of GP-write 
entirely with pre-existing or unlabelled funds,” 
he says. “Companies like Gingko are relevant 
independent of their formal ties.” 

When (and if) the consortium can secure 
funding for its ultra-safe human-cell-line 
project, the team plans to imitate previous 
efforts by Church’s lab to recode the genome 
of Escherichia coli bacteria, making it resistant 
to viruses. 
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In that project’, researchers swapped all 
321 instances of one 3-letter genetic word, or 
codon, with another that conveys the same 
message. They then eliminated the gene that 
allowed the cell to read the original codon. This 
didn’t much affect the redesigned microbe, but 
it did neutralize viral invaders because, like all 
natural life, they rely on that codon for proper 
protein assembly. 

Extending this recoding technique to the 
human genome won't be easy. Repurposing 


just one codon across all 20,000 human genes 
will require hundreds of thousands of DNA 
changes. It might be easier to synthesize large 
swathes of the genome rather than edit letters 
one by one. 

Church’s team used synthesis in follow- 
up work’ to recode seven codons in the 
E. coli genome. That effort needed close to 
150,000 genetic changes, and it revealed unex- 
pected design constraints and difficulties in 
stitching together DNA fragments. These have 


IN FOCUS | NEWS 


stymied efforts to make the reconstructed 
bacterium viable. 

That should be a sobering reminder as the 
ultra-safe human-cell-line project gets off the 
ground, says Nili Ostrov, a postdoc in Church's 
lab who is leading the research. “In humans,” 
she says, ’there are going to be a lot of design 
rules that we just don’t know.’ a 
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ENVIRONMENT 


Brazil’s lawmakers push to 
weaken environmental rules 


Legislation includes proposals to open up the Amazon rainforest to agriculture. 


Trees taken in illegal logging operations in the Brazilian Amazon lie in piles at a sawmill. 


BY JEFF TOLLEFSON 


conservative coalition that dominates 
Am Congress is girding itself for a 

final push to roll back environmental 
regulations before campaigns for the country’s 
October presidential election ramp up. 

The legislation under consideration includes 
proposals to open up the Amazon rainforest 
to sugarcane farming — which was banned 
in 2009 over concerns about deforestation. 
Another proposal would weaken licensing 
requirements for infrastructure such as dams, 
roads and agricultural projects. But the rural- 
agricultural coalition behind the proposals is 
running up against public opposition that has 
thwarted previous efforts to loosen environ- 
mental rules. 


Further complicating this fight is an ongo- 
ing corruption scandal that has landed former 
president Luiz Inacio Lula da Silva in jail. He 
was a leading candidate in this year’s election 
before his conviction. 

“There’s this very delicate balance,” says 
Mercedes Bustamante, an ecologist at the Uni- 
versity of Brasilia. The conservatives have sup- 
port from Brazilian president Michel Temer as 
well as the votes they need to move legislation 
through Congress, she says. Lawmakers could 
push forward, Bustamante adds, but they’re 
wary about sparking a public backlash before 
the election. 

Previous efforts to scale back protected areas 
and indigenous rights in the Amazon rainfor- 
est floundered as activist groups and celebrities 
mobilized public opposition. 


The conservatives have had only one major 
success on the environmental-regulation front 
so far. In 2012, they revised the Brazilian law 
governing forests, making changes such as elim- 
inating penalties for any illegal deforestation 
that took place in the Amazon before July 2008. 
Environmental groups challenged the consti- 
tutionality of the revised law, but in February 
Brazil's Supreme Court upheld those changes. 

“Tt was the worst thing that could have hap- 
pened,’ says Carlos Nobre, a climate scientist 
in Sao José dos Campos and former secretary 
for research and development at Brazil’s Min- 
istry of Science, Technology and Innovation. 
But he thinks the conservative coalition’s 
broader environmental agenda has stalled and 
is unlikely to advance in the coming months. 

Brazil was once seen as a global leader on 
environmental issues, in large part because of 
its success in curbing deforestation. Between 
2004 and 2012, the annual amount of rain- 
forest that was cleared for agriculture fell by 
nearly 84% to 4,571 square kilometres. Those 
numbers subsequently crept back up, peaking 
at 7,893 square kilometres cleared in 2016. 
However, deforestation dropped by 16% to 
6,624 square kilometres in 2017, partly because 
of lower demand for beef and the restoration of 
law-enforcement funding, which had been cut 
during a prolonged financial crisis. 

The environment will certainly be on the 
election agenda, says Bustamante, because 
Lula’s first environment minister, Marina Silva, 
is one of the candidates. 

Regardless of the outcome, the political 
dynamic in Brazil’s Congress is unlikely to 
change, says Paulo Barreto, a senior researcher 
with the activist group the Amazon Institute 
of People and the Environment in Belém. The 
conservative coalition is strong, and Barreto 
thinks that it will stay in power. m 
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Billion-star map set to 


transform astronomy 


European Gaia mission releases most detailed 3D chart yet of Milky Way. 


BY DAVIDE CASTELVECCHI 


fter a feverish wait, astronomers 
Ave the world have an ocean 

of new information to explore. On 
25 April, the European Space Agency’s (ESA) 
Gaia mission published its first fully 3D map 
of the Milky Way. 

The data haul includes the positions of nearly 
1.7 billion stars, and the distances, colours, 
velocities and directions of motion of about 
1.3 billion of them. Together, they form an 
unprecedented live video of the sky, covering 
a volume of space 1,000 times larger than that 
captured by any previous survey (see ‘Gaia's 
gold’). “In my professional opinion, this is crazy 
awesome,’ says Megan Bedell of the Center for 
Computational Astrophysics in New York City, 
one of the many astronomers who are already 
conducting studies based on the data set. “We're 
very curious to see what the community will do 
with it,’ says Anthony Brown, an astronomer 
at the Leiden Observatory in the Netherlands 
who chairs Gaia's data-processing collaboration. 

Atan event at the Royal Astronomical Soci- 
ety in London to present the Gaia catalogue, 
astronomer Gerry Gilmore of the University of 
Cambridge, UK, showed a striking video that 


GAIA’S GOLD 


extrapolated from the Gaia data to simulate the 
future motions of millions of stars. “Everything 
moves,’ he said. 

The 2-tonne Gaia spacecraft, part of a €1-bil- 
lion (US$1.2-billion) mission, launched in late 
2013 and began collecting scientific data in 
July 2014. Gaia is in a stable orbit that remains 
fixed relative to both the Sun and Earth, and 
makes repeated measurements to estimate the 
distances to stars — and other celestial objects 
— using a technique called parallax. 

Alongside its 551-gigabyte database, the Gaia 
team also released a number of scientific papers. 
The goal of these was to describe quality checks 
the researchers did on the data and demonstrate 
how those data can be used; the mission's policy 
is to make the catalogue immediately available 
to the community, rather than to reserve it for 
the team’s own science studies first. 

Still, the Gaia papers describe a wealth of 
original findings, said Floor van Leeuwen, 
another senior Gaia scientist at Cambridge, 
at the press briefing. He showed, for example, 
how Gaia proved for the first time that certain 
star clusters puff up at the same time as large 
stars sink to their centres. “We weren't allowed 
to make discoveries, but we couldn't avoid 
making them, he said. 


Gaia has.measured with high-precision the positions, distances and motions of more than 1 billion stars 
in the Milky Way. It covers about one-quarter of the disc of our Galaxy; its predecessor mission, 
Hipparcos, mapped about 100,000 stars-in a much smaller region around the Sun. 


=—— Gaia's limit for measuring 


Galactic 
Centre 


isis could measure 
stellar distances with an 
accuracy of 10% up to 
only 100 parsecs* 


*1 parsec = 3.26 light years 
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distances with an accuracy 
of 10% is 10,000 parsecs 


Gaia will eventually measure 
proper motions accurate to 
up to 1 kilometre per 
second for stars up to 
20,000 parsecs away 


One of those findings has implications far 
beyond the Milky Way. Some astronomers are 
eager to see Gaia's measurements of certain 
types of variable star that are used as ‘stand- 
ard candles’ of cosmology. Knowing the pre- 
cise distances to these stars in the Milky Way 
makes them useful as yardsticks for measur- 
ing distances to galaxies much farther away. In 
particular, astronomers use standard candles 
to estimate how fast the Universe is expanding, 
but in recent years, measurements based on 
this technique have been in apparent contra- 
diction with predictions made using maps of 
the cosmic microwave background, the after- 
glow of the Big Bang. A preliminary look at the 
data shows that Gaia has improved the preci- 
sion of the standard-candle measurements, 
Gilmore said at the press briefing. But, he adds, 
“at face value, the tension is still there”. 

Dozens of preprints appeared in the days 
that followed, as teams around the world 
downloaded Gaia data and ran them through 
algorithms honed for years in preparation. 
For example, researchers are now able to test 
models of how the Milky Way formed through 
mergers of smaller galaxies and measure the 
distribution of dark matter. 

Gaia released a preliminary catalogue in 
2016, but at that time, it had not yet gathered 
enough data to directly measure the distances 
to many stars. Further data releases will con- 
tain more information and will enable entirely 
new kinds of studies; the next release will be in 
2020. The probe also monitors asteroids and 
will help scientists to track bodies that look to 
be on a collision trajectory with Earth. 

Gaia has enough fuel to keep operating 
until 2024, if nothing breaks down and ESA 
extends the mission beyond its current 2019 
end date, says project scientist Timo Prusti at 
ESA’s space-research and technology centre in 
Noordwijk, the Netherlands. The probe is in 
overall good health, he says. m 


The News Feature ‘The cells that sparked a 
revolution’ (Nature 555, 428-430; 2018) 
incorrectly stated that ViaCyte had restarted 
a 2014 clinical trial after redesigning its 
encapsulation technology. It had in fact 
paused enrolment on one trial and started 
another. 
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Los esfuerzos de pa 
conflicto. Ahora, cientifi 
yavictimas, al tiempo que at 


POR SARA REARDON 
TRADUCIDO POR DEBBIE PONCHNER 


uando empezo a estu- 
diar a las personas que 
habian aterrorizado a 
su pais, Natalia Trujillo 
se preparo para encon- 

trarse cara a cara con monstruos. 
Entrevistaria a excombatientes del 
largo y sangriento conflicto que se habia 
apoderado de Colombia por mas de 50 
afios. La compleja lucha de poder entre 
los insurgentes guerrilleros, el gobierno, 
los grupos paramilitares y los narcotrafi- 
cantes habia matado a cientos de miles de 


personas y habia desplazado a mi 
Cuatro miembros de su familia habia 
sido secuestrados y la violencia habia 
expulsado a su padre de sus tierras. Algu- 
nos de sus colegas habian pasado por 
experiencias mucho peores. 

Trujillo, ahora una experta en neuro- 
ciencia en la Universidad de Antioquia, 
en Medellin, estaba interesada en estu- 
diar las raices psicolégicas dela violencia, 
observando alos combatientes que habian 
depuesto las armas e intentaban reingre- 
sar a la sociedad civil. Su oportunidad 


Medellin, a 
excombatientes. 
Ella y su equipo de 
saron al enclave con una bat 
bas cognitivas, botones de pé 
caso de que algo saliera mal- y algu 
ideas preconcebidas. “Pensé que perso 
nas que pueden matar a sus vecinos, que 
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El municipio de Vista Hermosa podria servir como un laboratorio natural para el 


estudio de la reconciliacion. 


> pueden destruir sus comunidades, que 
pueden tener el corazén para obligar a otras 
personas a abandonar sus fincas, tienen que ser 
realmente malas’, dice Trujillo. Y se encontré 
con algunos que llenaron sus expectativas. 

Con cadenas alrededor de sus cuellos y 
jactanciosos pavoneos, algunos trataron 
de intimidar a los investigadores. Pero con 
mayor frecuencia, los cientificos encontraron 
personas comunes, paseando en el jardin y 
comiendo helado con sus hijos. 

“Al principio estaba bastante decepcionada’, 
dice. Si algo andaba mal en sus cerebros, eso 
proporcionaria una explicacion facil para toda 
la maldad que habian hecho. Pero después 
de estudiar a mas de 600 combatientes, ella 
comenz6 a comprender la complejidad de sus 
experiencias. “Me di cuenta de que no todos 
son socidpatas. Me di cuenta de que la mayoria 
de ellos también son victimas”. 

Ese reconocimiento ha llevado a Trujillo y 
a sus colegas a reexaminar no solo sus propios 
sentimientos acerca de los excombatientes, 
sino también el enfoque que debe tener el pais 
para lidiar con ellos. 

El gobierno de Colombia participa actual- 
mente en uno de los mayores esfuerzos de paz 
en la historia. Como parte de un tratado de 
2016 con el grupo guerrillero de izquierda 
conocido como las Fuerzas Armadas Revo- 
lucionarias de Colombia (FARC), el gobierno 
otorgara amnistia a los combatientes que 
abandonen el conflicto y completen un pro- 
grama de reincorporacion, siempre que no 
hayan cometido delitos graves. Unos 6.800 
combatientes de las FARC ya han ingresado 
al programa. 

El esfuerzo, que es politicamente contro- 
vertido y se espera que cueste 129,5 billones 
de pesos colombianos (46 millones de déla- 
res estadounidenses), enfrenta abrumado- 
ras dificultades. Pero también le otorga a los 


20 | NATURE | VOL 557 | 3 MAY 2018 


que no to 
son socidpatas. 
Me di cuenta que 
la mayoria de 
ellos también son 
victimas”. 


cientificos una oportunidad unica para com- 
prender a una poblacién que tanto ha infligido 
y como ha sufrido los horrores de la guerra. La 
mayoria de las investigaciones sobre las raices 
psicolégicas de la violencia y el trauma se han 
realizado con veteranos de paises adinerados 
que lucharon en conflictos lejos de casa. La 
mayoria de los excombatientes de Colombia, 
por el contrario, tienen poca educacion y estan 
tratando de reingresar a la misma sociedad que 
una vez aterrorizaron. 

Alli enfrentan un enorme estigma y resen- 
timiento, lo que les dificulta encontrar trabajo 
y entablar relaciones con los demas. 

Un pufiado de cientificos estan estudiando 
alos excombatientes con un detalle sin prece- 
dentes, con la esperanza de poder informar y 
guiar el proceso de paz. Han descubierto que 
los afios de aislamiento y exposicion a la vio- 
lencia podrian haber alterado la psicologia 
y el procesamiento cognitivo de los excom- 
batientes de maneras sutiles. En pruebas de 
laboratorio, muchos tienen dificultades para 
identificarse con los demas y emiten juicios 
éticos errados —deficiencias que podrian afec- 
tar la forma en que participan en la vida civil-. 

Los cientificos ahora estan iniciando 


estudios a largo plazo en pueblos que estu- 
vieron plagados por conflictos, para rastrear 
como la cognicién y las actitudes podrian cam- 
biar a lo largo del proceso de reconciliacién, 
tanto para los excombatientes como para los 
civiles. Los datos podrian eventualmente infor- 
mar los esfuerzos de recuperacion de otros pai- 
ses devastados por la guerra. 

Pero lainvestigacion también esta revelando 
cuan profundo es el desafio. Y algunos exper- 
tos temen que la atencién disponible para los 
excombatientes, mientras tanto, es inadecuada. 

“Va a ser increiblemente dificil salir de este 
circulo vicioso’, dice Jiovani Arias, psicote- 
rapeuta y politdlogo de la Universidad de los 
Andes, en Bogota. Sin inversiones para mejo- 
rar la salud mental, dice, el legado de violencia 
que afecta tanto a excombatientes como a civi- 
les podria torpedear los precarios esfuerzos de 
paz de Colombia. 


EL CAMINO HACIA LA PAZ 

En el momento que un autobus lleno de cienti- 
ficos llega al municipio de Vista Hermosa, en el 
centro de Colombia, Diana Matallana, neurop- 
sicdloga de la Pontificia Universidad Javeriana, 
en Bogota, todavia no puede creer en donde 
esta. “Hace cinco afios, no podias venir aqui’, 
dice. “Es como un simbolo de la parte mas dura 
del conflicto” 

El conflicto armado colombiano crecié y 
mengué durante medio siglo mientras varios 
grupos militares competian por el control del 
territorio. Los civiles quedaron atrapados en 
el fuego cruzado: mas de 260.000 personas 
murieron y 7 millones fueron desplazadas 
durante las décadas de violencia, segun un 
registro gubernamental de victimas. 

La region del Meta, donde se encuentra 
Vista Hermosa, fue una de las muchas areas 
abandonadas por el ejército colombiano en la 
década de 1990 y dejada para ser gobernada 
alternativamente por grupos paramilitares y 
guerrillas. Fue un arreglo tenso. Las guerrillas 
ayudaron a desarrollar infraestructura, pero no 
dudaban en matar rapidamente a informantes 
sospechosos. Los paramilitares, contratados 
principalmente por los capos de la droga y las 
élites politicas adineradas, tendian a ser mas 
despiadados, torturaban a supuestos espias y 
dejaban cadaveres en las puertas de las escue- 
las. Ambas partes estaban fuertemente involu- 
cradas en el trafico de cocaina y secuestraron a 
miles de personas a cambio de una recompensa 
—entre ellos, al hermano de Matallana-. 

Desde el acuerdo de paz de 2016, se le ha 
permitido a los combatientes de las FARC 
entrar en una campafia de desarme y rehabili- 
tacion dirigida por la Agencia de Reincorpo- 
racion y Normalizacion de Colombia (ARN), 
en Bogota. La ARN habia sido establecida 
afios antes, y desde entonces ha facilitado la 
reintegracién de unos 20.000 paramilitares y 
guerrilleros que abandonaron el conflicto de 
forma independiente 0 como parte de otro 
acuerdo de paz. 

La ARN ahora opera 26 asentamientos 
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improvisados en toda Colombia, conoci- 
dos como zonas de transito (ver “Un con- 
flicto persistente’), para los miembros de 
las FARC que reci€én estan reingresando a 
la sociedad. Ofrecen servicios como educa- 
cién y atencién de la salud, también ayudan 
a proporcionar algo de proteccién para los 
excombatientes, que son blanco habitual de 
antiguos enemigos, de antiguos aliados que 
se niegan a rendirse, y de civiles. Después de 
completar un programa, los excombatientes 
pueden recibir cédulas de identidad que les 
permite vivir y trabajar legalmente en el pais. 

La posibilidad de su regreso no emo- 
ciona a algunos de los residentes de Vista 
Hermosa. Un letrero en la carretera que 
dice “Unidos, la paz y el pos-conflicto son 
posibles” ha sido bombardeado con bolas de 
pintura rosa. “Alguien no esta de acuerdo’, 
comenta uno de los investigadores. 

Matallana y Carlos Gémez, un psiquiatra 
de la Javeriana, planean iniciar un estudio de 
10a 20 afios de duracién en el que se seguira 
amas de 2.000 personas de Vista Hermosa, 
civiles y excombatientes por igual. “Estamos 
planeando por primera vez — en Colombia y 
en el mundo- aprender qué cosas ayudan a 
lograr la reconciliacién’, dice Gomez. 

El equipo pretende medir factores como 
el neurodesarrollo en los nifios, la cognicién 
social y la regulacién emocional en adultos, 
ylasalud mental de todos los participantes 
para ayudar en el proceso de reintegracién. 
En un proyecto piloto, financiado por funda- 
ciones filantrdpicas, han entrevistado a 200 
civiles, ademas de los representantes de 150 
excombatientes que viven en una zona tran- 
quila a solo tres horas de distancia. “Necesi- 
tamos tener buenos datos para ver como esta 
funcionando y para poder hacer interven- 
ciones rapidamente si vemos que el proceso 
no esta funcionando bien’, dice Gomez. 

Ha sido dificil para los investigadores - 
tanto aqui, como en otros lugares- estudiar 
si programas como estos pueden evitar que 
los combatientes vuelvan a la delincuencia, 
en gran parte porque a menudo es imposible 
rastrear los resultados de las personas que 
los atraviesan. “Simplemente asumimos que 
tiene un efecto, y no tenemos otra opcién’, 
dice Enzo Nussio, un experto en ciencias 
politicas del Instituto Federal Suizo de Tec- 
nologia, en Zurich. 

Sin embargo, Nussio y otros tienen espe- 
ranzas sobre los resultados en Colombia. El 
pais tiene muchos mas recursos para dedi- 
car al esfuerzo que naciones como Burundi y 
Sudan, que han emprendido esfuerzos simi- 
lares con poco éxito. 

Los excombatientes, mientras tanto, se 
enfrentan a una mezcla de desafios —algunos 
familiares y otros nuevos-. Al igual que los 
veteranos de otros conflictos, a muchos les 
resulta dificil estar cerca de personas que no 
entienden la experiencia del combate, dice 
Thomas Elbert, psicdlogo dela Universidad 
de Konstanz, en Alemania. Pueden acercarse 
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La Base de Datos Globales sobre Terrorismo (Global 
Terrorism Database) contiene décadas de datos sobre 
ataques y secuestros iniciados por las FARC, los 
paramilitares, traficantes de drogas y otros grupos militantes 
durante la mitad siglo de violencia que se apoderé del pais. 
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vidas perdidas en el conflicto a través de los afios. 
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> aotros que han vivido en la violencia, que 
puede ser peligroso en un lugar como Colom- 
bia, donde los narcotraficantes y otros grupos 
armados todavia operan. 

Colombia plantea algunos desafios unicos, 
dice Gomez. A diferencia de aquellos que 
libran guerras civiles en muchos otros lugares, 
los guerrilleros colombianos no son impulsa- 
dos por la raza ola religion, sino por ideologia 
politica. Reprogramar sus corazones y mentes 
podria requerir estrategias diferentes a las uti- 
lizadas para otros militantes radicalizados, cri- 
minales de guerra 0 asesinos en serie, y nadie 
sabe cuales deberian ser esas estrategias. 

Gustavo Tovar, presidente del consejo muni- 
cipal de Vista Hermosa, teme que su pueblo -y 
el pais— no estén listos para la ola de excom- 
batientes. “Colombia esta en el medio de esta 
metamorfosis”, dice. “Nosotros entramos en 
ella sin saber lo que estamos haciendo”. 


EL PESO DEL PASADO 

Al oeste de Meta se encuentra el Valle del 
Cauca, una region agricola casi plana rodeada 
de montafias. Aqui, en un hotel lujoso con 
vistas a vifiedos y plantaciones de cafia de 
azucar, el excomandante de las FARC Juan 
Carlos Sanchez esta viendo videos en una 
computadora portatil. Disefiados para la 
investigacion, los videos presentan diferentes 
tipos de altercados: una discusién entre dos 
personas, alguien que es golpeado por una 
silla, una persona que le mete una pufalada 
en la espalda a otro. 

Sanchez se unio a la guerrilla en 1998 
cuando vivia en Meta. No fue completamente 
por eleccion propia, dice. Cuando los militares 
colombianos abandonaron la regién, las FARC 
se convirtieron en el gobierno de facto. La gue- 
rrilla persuadié a los locales para que se arma- 
ran en caso de que regresaran los paramilitares 
y el gobierno, diciéndoles que esta era la unica 
forma de proteger a sus familias. Eventual- 
mente, Sanchez se unio al grupo oficialmente. 

Al principio, dice, los guerrilleros no ataca- 
ban a civiles — tan solo entrenaban a sus reclu- 
tas en politica y tacticas de guerra-. Pero con el 
tiempo, las FARC se volvieron mas violentas y 
desconfiadas de los demas. “Desde el comienzo 
de entrar en la organizacion, hasta el dia que yo 
deserté, siempre me vine con preguntas’, dice, 
preguntas sobre él, sus camaradas y las perso- 
nas que daban las érdenes, particularmente, 
su disposicién a matar a antiguos aliados. Pero 
Sanchez calld sus dudas, porque cuestionar las 
érdenes lo llevaria a la muerte. En vez de eso, 
ascendié en las filas, liderando eventualmente 
a unos 25 combatientes. 

Para 2005, se habia vuelto mas desilusio- 
nado con las FARC. Le entregaron una lista 
con nombres de informantes sospechosos a 
los que le ordenaron debia de matar; la lista 
incluia a nifios de 12 y 13 afios, asi como a dos 
personas que él conocia desde su infancia. Elle 
ordené a sus tropas que lo hicieran - algo que 
lo atormenta hasta el dia de hoy-. “Uno es el 
que lleva la carga’, dice. 
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Luego, unos afios después, los guerrilleros 
condenaron a su novia, también miembro de 
las FARC, por espionaje. Sanchez hizo planes 
para huir con ella, pero fue descubierto y tuvo 
que huir por su cuenta. Mas tarde supo que ella 
habia sido ejecutada. 

Por tres afios, Sanchez vivid escondido en 
Meta, viendo enemigos en todas partes. “Vivia 
en una zozobra permanente’, dice. Eventual- 
mente, un amigo le conté sobre el programa de 


rehabilitacion del gobierno y se unio. Se mudd 
al Valle del Cauca y ahora trabaja en el campo 
para ganarse la vida. El ha tenido consultas con 
un psicdélogo y ha comenzado a leer la Biblia. 
“Entre los dos, me han ayudado a auto-perdo- 
narme y a perdonar’, dice. 

Enel hotel, Sanchez mira los videos de alter- 
cados intercalados con una serie de preguntas: 
le preguntan si las imagenes le molestan, 0 si 
algunas de las partes involucradas debe ser cas- 
tigada. Luego trata de identificar las emociones 
a través de una serie de caras. 

“Estamos observando qué tipo de errores 
cometen’, dice Agustin Ibafiez, un neurocien- 
tifico de la Universidad Favaloro de Buenos 
Aires, Argentina, quien esta administrando 
la prueba, una version simulada de la misma 
prueba que ya le ha aplicado a unos 350 
excombatientes colombianos. Ibafiez pretende 
discernir cémo el aislamiento de la sociedad 
y la exposicién a la violencia pueden haber 
menoscabado su procesamiento emocional y 
su juicio moral'. Los excombatientes —tanto 
lo de las FARC, como los paramilitares— tien- 
den a tener problemas para distinguir entre 
las emociones, especialmente el miedo y la ira. 
Aunque no esta claro si estos efectos persisten 
fuera del laboratorio, a Ibafiez y a su equipo 
les preocupa que los problemas emocionales 
puedan hacerles la vida atin mas dificil a los 
excombatientes, como le ha ocurrido a excom- 
batientes en otros lugares. 

Ely Eduar Herrera, un psicdlogo cognitivo 
de la Universidad Icesi, en Cali, comenzaron 
esta linea de investigacién en 2014, traba- 
jando con un grupo de paramilitares encarce- 
lados por crimenes de guerra. Habian matado 
a un promedio de 33 personas cada uno, y 
algunos se les encontré responsables de masa- 
crar a cientos. 

“La primera vez, estabamos muy asustados’, 
recuerda Ibafiez. Los excombatientes no esta- 
ban esposados y se encontraron cara a cara con 
los investigadores. “Tienes la sensacién de que 
podrian matar a todos, si asi lo desearan’. 

En 2017, los investigadores encontraron que 
una caracteristica clave de los excombatientes 
es como juzgan la moralidad de una accion. 
La mayoria de los participantes condenarian 
un intento de envenenamiento, por ej emplo, 
incluso si no se lograra matar al objetivo. 
Pero el grupo de Ibafiez ha descubierto que es 
menos probable que los excombatientes con- 
denen a alguien por un intento de asesinato 
fallido, razonando que si la victima no murid, 
no hubo dafio. Al mismo tiempo, es mas pro- 
bable que quieran castigar a las personas por 
dafios que son claramente accidentales. Seguin 
su légica, el resultado es mas importante que 
la intencion. 

Basado en una pequefia muestra, parece 
que los paramilitares estan mas perjudicados 
en este sentido que los guerrilleros. Ibafez dice 
que esta diferencia podria tener sentido: las 
personas que se unieron a los paramilitares por 
un salario podrian haber estado mas atraidas 
hacia la violencia que aquellos que se unieron a 
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la guerrilla por razones ideoldgicas, aunque no 
hay forma de probar esto. Estos datos sugieren 
que los programas de rehabilitacién no debe- 
rian tratar a todos los excombatientes de la 
misma manera. 

Trujillo también ha encontrado diferencias 
notables en sus participantes. En un estudio 
de 624 exguerrilleros y exparamilitares’, ella 
y sus colegas descubrieron que la capacidad 
de empatizar se dividia en 3 grupos: el 22% de 
los excombatientes funcionaba de forma muy 
similar a personas que no habian experimen- 
tado la violencia; el 32% tenia la capacidad de 
reconocer el dolor o la desgracia en los demas, 
pero no se veia tan afectado por ellos; y el 
resto no podia reconocer sentimientos como 
la angustia emocional en los demas, ni sentir 
empatia con ellos. 

Los investigadores se preguntan si estos 
excombatientes —-que se desmovilizaron 
voluntariamente- son similares a los 6.800 
combatientes de las FARC que ingresaron al 
programa de reincorporacién como parte del 
tratado de 2016, muchos de ellos por ordenes 
de sus comandantes. A diferencia de Sanchez, 
muchos siguen siendo decididamente ideold- 
gicos. Oscar Vega, por ejemplo, un excoman- 
dante de las FARC, delgado e intenso, que vive 
en la zona de transito cerca de Vista Hermosa, 
lleva a casi todas sus conversaciones al tema 
sobre las formas en que el gobierno y el sistema 
educativo de Colombia dafian a las personas. 
El todavia vive por la causa. “Nuestros docu- 
mentos y nuestra linea ideolégica dice que por 
cualquier via tenemos que llegar ala toma de 
poder, bien sea por la via armada 0 bien sea por 
la via politica’, dice. 

Trujillo esta comparando varios tipos de 
terapias para determinar la mejor manera de 
ayudar a los excombatientes a mejorar su des- 
empefio en las pruebas de empatia’. Ella y sus 
colegas estan utilizando la electroencefalogra- 
fia (EEG) para controlar la actividad cerebral 
de los excombatientes, con la esperanza de 
aprender cémo procesan la informacién*. En 
una investigacion aun sin publicar, el equipo 
descubrio que los excombatientes son mas 
rapidos en el reconocimiento de caras que 
los civiles -a pesar de que son mas lentos en 
identificar las emociones que estas reflejan-. 
También son mejores en la realizacién de 
tareas de memoria que van acompaniadas de 
imagenes violentas, como sangre o cadaveres. 
Las personas que los investigadores identifica- 
ron como victimas de la violencia muestran el 
patrén opuesto: tales imagenes perturban su 
concentracion y ralentizan sus respuestas. Los 
investigadores piensan que los circuitos neu- 
ronales de los excombatientes se han adaptado 
para reaccionar mas rapido ante las amenazas. 

El grupo de Trujillo asesora a la filial de 
Medellin de la ARN en su esfuerzo por reha- 
bilitar a los excombatientes. Pero tratar de 
utilizar la ciencia para cambiar y hacer poli- 
ticas puede ser como hacer esculturas con 
arena seca. “La investigacién ha sido muy 
complicada porque es un tema muy nuevo, 


no solo para Colombia, sino también para 
la literatura sobre neurociencia cognitiva y 
social’, dice. Los investigadores también se 
preocupan de que el gobierno pueda perder 
la paciencia. “Si no puedes encontrar algo 
sdlido para demostrar lo que esta sucediendo, 
no puedes proponer una solucién’, dice José 
David Lopez, un ingeniero que trabaja con 
Trujillo en la interpretacion de los datos de 
los EEG, en la Universidad de Antioquia. “Lo 
necesitan ahora, no en diez afios”. 


UNA BATALLA DESDE ADENTRO 

En su munca, Viviana Misas lleva el nombre 
del bebé que perdié mientras vivia con el Ejér- 
cito de Liberacién Nacional (ELN), otro grupo 
guerrillero de izquierda que todavia esta activo 
en Colombia. Misas se unié al ELN a los 15 
afios para alejarse de su familia, y llegd a amar 
la ideologia y la camaraderia que le proporcio- 
naba. Pero luego, durante una marcha dificil se 
cayéy se lesiond, y en el proceso tuvo un aborto 
espontaneo. Sus compafieros la abandonaron y 
ellatermin6 en el hospital por un largo tiempo. 
Desconfiando de su lealtad después de eso, un 
compajiero combatiente — su mejor amigo- la 
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traicion6 y la delaté al ejército colombiano. Ella 
fue capturada y acepté desmovilizarse. 

Al igual que Sanchez, no puede regresar a 
su casa en Medellin, por temor a que el ELN 
la mate como traidora. Aunque Misas disfruta 
de su trabajo como guia turistica en el Valle 
del Cauca, la depresion le impide perseguir 
sus suefios. “Quisiera que cuando estoy triste 
ser una persona normal que solo se puso 
triste un momentito y no tiene pensamien- 
tos tan raros como los mios’, dice. Ella no ha 
visto ningun psicoterapeuta, pero su perro le 
brinda cierto consuelo, al igual que la religion. 
Aun asi, sus pensamientos se vuelven oscu- 
ros; quizas, dice ella, la verdadera razon por 
la que se unio al grupo fue porque esperaba 
ser asesinada. “;Cdémo dejar (ir) esos pensa- 
mientos?’, se pregunta. 

Segtin los datos de la ARN, mas del 90% 
de los excombatientes en el programa tienen 
un problema psicosocial, como trastorno de 
estrés postraumatico o ansiedad. El coordina- 
dor regional de ARN, Juan Fernando Vélez, 
dice que la salud mental es una de sus prin- 
cipales prioridades cuando trabaja con ellos. 
Los datos de Trujillo, dice, convencieron a su 
oficina de la necesidad de crear un programa 
especial de reintegracién para personas con 
problemas psiquiatricos. “No podemos darlea 
la sociedad un individuo que no se siente bien 
consigo mismo’, dice. 

Joshua Mitrotti, quien dirigiéd la ARN 
durante tres afios antes de renunciar en marzo, 
dice que los programas de la agencia se basan 
en los esfuerzos llevados a cabo en América 
Central en la década de 1990, que proporcio- 
naron formacidn vocacional y educacién para 
los grupos armados. El apoyo psicosocial es un 
componente integral, dice. 

El programa de la ARN para guerrilleros y 
paramilitares que se desmovilizaron volunta- 
riamente incluy6é 30 meses de servicios psico- 
sociales, a cargo de unos 300 psicdlogos y 65 
trabajadores sociales, en promedio. Hasta el 
momento, 20.490 personas han completado 
el proceso de reintegracién y la ARN dice que 
mas del 70% se han reingresado exitosamente 
ala sociedad. 

Pero con decenas de miles de excombatien- 
tes en Colombia, simplemente no hay suficien- 
tes profesionales de la salud mental capacitados 
para brindar atencidn basica, y mucho menos 
terapia cognitiva intensiva. Por ello, algunos 
temen que el programa de reintegracién pueda 
estar brindando un tratamiento deficiente. “No 
es que estén haciendo las cosas mal, pero estan 
incompletas’, dice Herrera. 

Uno de los desafios es la dificultad de pro- 
porcionar tratamiento a adultos que no com- 
pletaron la escuela primaria y no pueden leer, 
una habilidad requerida para algunas de las 
terapias habituales. 

Mitrotti dice que la ARN ha estado ajus- 
tando los enfoques para hacerlos mas apro- 
piados. Segtin la ARN, el 30% de las personas 
que acudieron a los servicios psicosociales el 
afio pasado lo hicieron sin ningun incentivo > 
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Con una economia lenta y una controversia continua sobre el acuerdo con las FARC, muchos en Colombia tienen sombrias esperanzas para el futuro. 


> monetario (los excombatientes a menudo 
reciben un estipendio por participar en los 
programas). “No vienen porque se les paga, 
sino porque creen que necesitan el apoyo de 
nuestros profesionales’, dice Mitrotti. 

Sin embargo, la ayuda para los miembros de 
las FARC que recién se han desmovilizado ha 
tardado en llegar. Andrés Restrepo, un socié- 
logo que trabaja en una zona de transicién 
en Caqueta, dice que los excombatientes de 
las FARC no estan recibiendo ningun tipo de 
atencion de salud mental. Restrepo dice que la 
ARN les ha prometido la llegada de seis psicé- 
logos a la region, pero incluso eso no seria sufi- 
ciente para atender a los 1.000 excombatientes 
y asus familias que viven alli ahora. 

Restrepo teme que si estos individuos no 
se encuentran psicoldgicamente estables, el 
rechazo de la sociedad —incluidas sus propias 
familias— podria llevarlos nuevamente a la vio- 
lencia. “Nadie les ayud6 a imaginar una vida 
sin armas’, dice. 


UN FUTURO INCIERTO 

En Pinalito, un pequefio y polvoriento pue- 
blo de casas de madera fluorescentes, en las 
afueras de Vista Hermosa, los civiles atin se 
estan acostumbrando a la paz. “Es genial, no 
hay personas muertas’, dice Carlos Garcia, 
un comerciante jubilado de edad avanzada. 
Recuerda escuchar de forma cotidiana los 
disparos, justo afuera de su puerta, cuando 
las FARC combatian a los paramilitares. 
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Ahora las calles son tranquilas y la gente 
visita los cafés. A algunos les faltan piernas 
— Colombia tiene una de las tasas mas altas 
de victimas de minas terrestres en el mundo, 
y Meta es una de las regiones mas minadas-. 
Con el trasfondo de una ofensiva guberna- 
mental contra el trafico de cocaina, la caida de 


imaginar una 
sin armas”. 


los precios del petrdleo y el aumento del temor 
de que la paz sea solo temporal, la poblacién 
de Pifialito parece tener pocas esperanzas para 
el futuro. 

Y el acuerdo de paz en si mismo esta ame- 
nazado. Este mes, Colombia celebrara su elec- 
cion presidencial, y el tema clave es si se debe 
renegociar el acuerdo para que sea menos 
favorable para las FARC. Muchos de los gue- 
rrilleras, mientras tanto, estan perdiendo la fe 
en el proceso. Algunas de las zonas de transito 
a las que enviaron alos miembros de las FARC 
a vivir por un tiempo todavia no tienen agua 
corriente ni saneamiento. Y los programas de 
desarrollo agricola y vocacional han tardado 
en arrancar. En todo el pais, mas de la mitad de 


los guerrilleros han abandonado estas zonas, 
optando por jugarse la suerte en una sociedad 
que no es segura para ellos. Desde el acuerdo, 
cientos de exmiembros de las FARC han sido 
asesinados. 

A medida que los excombatientes vuelven a 
la sociedad 0 se retiran a la jungla, los expertos 
se preocupan por el estigma que llevan - de 
estar afiliados a las FARC y el de las enferme- 
dades mentales—. Matallana espera que una de 
las cosas que su investigacion pueda hacer sea 
mostrar al publico cémo el trauma afecta tanto 
a excombatientes como a civiles. 

Los recursos son escasos y el problema es 
inimaginablemente complejo, dice Vélez. En 
Ultima instancia, dice, el éxito de Colombia 
depende de la voluntad de su gente y de su 
capacidad para hacer las paces con el pasado. 
“No hay formulas magicas’, dice. “Lo unico 
que debemos entender es que todos necesitan 
—merecen- una segunda oportunidad”. = 


Sara Reardon trabaja como periodista 
para Nature desde Washington D.C. El 
Centro Pulitzer para el Reporteo de Crisis 
proporciono el financiamiento del viaje para 
llevar a cabo este reportaje. 
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Chemist Youyou Tu, who discovered the malaria treatment artemisinin, was the first Chinese female scientist to win a Nobel prize. 


Close the gender gap in 
Chinese science 


Analysis shows that extending the age limit for grants boosts the number awarded to 
women, but more must be done to achieve parity, say Ying Ma and colleagues. 


cc omen hold up half the 
sky” was a popular slogan 
in Mao Zedong’s China of 


the mid-twentieth century, intended to 
emphasize the equal importance of women 
in public and private life. But even though 
China used such slogans and had consti- 
tutional claims of gender equality decades 


before many other nations, inequalities 
persist. By 2017, just 6% of the members 
of the Chinese Academy of Science were 
women. 

In the 1980s and 1990s, advances in the 
country’s technological capacity gener- 
ally involved importing knowledge. Now, 
China is focused explicitly on building its 


own research and development (R&D) 
and innovation. Its R&D staff swelled from 
3.2 million in 2009 to 5.8 million in 2016 
(ref. 1), and the increased demand for talent 
has highlighted the need for more female 
scientists. Currently, women make up only 
about one-quarter of this workforce. At 
the same time, increased connections 
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> between China and the international 
community have made concerns about 
gender inequality more prominent. 

Multiple governmental and scientific 
organizations in China have taken meas- 
ures to promote women in science. Here 
we present the results of several initia- 
tives undertaken by the National Natural 
Science Foundation of China (NSFC) 
(where X.G. and L.S. work; Y.H.Z. did so 
until March 2018). 

Notably, after age limits for female grant 
applicants were extended, the percentage of 
women winning grants from a major fund 
for young scientists rose by 10% in one year. 
But there is still a long way to go. 


MINDING THE GAP 

Women in China have one of the highest 
rates of participation in the labour force 
when compared with women from both large 
developed and emerging economies, such as 
the United States, Germany, Brazil and India. 
This is a legacy of its planned-economy era, 
starting in 1949, when women's participation 
in the workforce was encouraged and pro- 
tected. As late as 1988, women made up 48% 
of the labour force in China, and women’s 
average earnings were 84% of those for men. 
By 2002, however, 10 years after the country 
moved to a market economy, women made 
up 46% of the labour force and their earnings 
were 79% of those of men’. 

Many universities in China have adopted 
a policy of ‘promote or leave. This means 
that scientists gain a permanent position 
only if they pass an evaluation at the end of 
a 6-year probationary period, which often 
coincides with women’s child-bearing years. 

Similar to other countries, China has a 
‘leaky pipeline’ for women in science — 
fewer women advance through each stage 
of a scientific career. In 2016, 53% of mas- 
ter’s students and 39% of doctoral students 
in China were women’. That proportion falls 
to 14% for recipients of the NSFC’s Distin- 
guished Young Scholars Award, which helps 
rising researchers under 45 to become lead- 
ers in their fields. 

In 2010, a joint document from the 
Central Committee of the Communist 
Party of China and the State Council called 
for the creation of policies to help talented 
men and women balance work and family. It 
advocated for a more equitable gender ratio 
in professional workplaces. In 2011, the 
Ministry of Science and Technology and the 
National Women’s Federation jointly issued 
a policy document to champion the develop- 
ment of women in science and technology 
careers. 

In 2010, a survey of the NSFC’s appli- 
cants found that about 70% of women 
and 24% of men supported a policy that 
sought to redress historical disadvantages 
through affirmative action. Measures 
around maternity and parental rights were 
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GENDERED ATTITUDES 


Nearly 6,000 scientists across China were asked whether they agree with the following statements. 
The survey found viewpoints, especially among men, that could hold back women’s careers. 


“Women are ‘not suitable’ for research work.” 


Agree No opinion 
Men 11% 23% 
Women 7 8 


66% 
85 


“A man’s success is measured by his career, while a woman’s 


success is measured by her family.” 


Men 21 21 
Women 10 8 


“Men make better project leaders.” 


Men 27 


supported by majorities of both genders’. 

Most grant and job applications in China 
already restrict eligibility by age, so changing 
these requirements offered a way to support 
female scientists. Thus, in 2011, the NSFC 
increased the age limit for women applying 
to its Young Scientist Fund from 35 to 40, 
while that for men remained at 35 (one of us, 
X.G., was involved in making this decision). 
This programme is the second-largest of 
the NSFC funds and the main way in which 
early-career scholars in basic science receive 
national funding. As of 2016, the pro- 
gramme represented 13.8% of the roughly 
US$4.1-billion budget the NSFC spent on 
projects, and financed 39% of all individual 
projects. 

Higher age limits (38 for men, 40 for 
women) were also established for a new 


MORE GRANTS FOR WOMEN 


Policies to accommodate parenthood 
increased female applicants and awardees 
for a Chinese fund for young scientists. 


Applicants 


Awardees 


In 2011, the age 


limit for women rose 
from 35 to 40. 


Percentage of women 


) 


2010 2012 2014 2016 


58 
82 


25 48 


81 


programme, the Excellent Young Scientist 
Fund. This supports about 400 projects 
a year and represents 2.2% of the NSFC 
budget. Another new policy allowed women 
to apply to extend NSFC project terms (but 
not funding amounts) by up to 24 months 
for maternity leave. 

Also in 2011, the NSFC pledged to 
increase the number of female scientists 
on review panels, although it did not set 
a quota. It invited review panels to con- 
sider prioritizing female applicants when 
all else was equal; enhanced the publicity 
surrounding research findings by female 
scientists financed by its programmes; 
and started to collect statistical data about 
the gender of applicants and awardees’. In 
2016, the Chinese Academy of Science and 
Technology for Development (CASTED) 
surveyed more than 5,800 scientists about 
their attitudes to gender roles and recent 
policies (an effort led by Y.D.Z. and Y.M.). 


EFFECTS AND EXPECTATIONS 

What happened? Raising the age bar in 
2011 saw the percentage of women applying 
for the Young Scientist Fund increase from 
37% to 48% (see ‘More grants for womer). 
Applications from women soared by 94% to 
25,694; about one-third were aged 36-40. 
Applications from men went up by 23%, 
to 28,397 in the same period. The year 
before, applications rose by 25% for men 
and 31% for women’, partly owing to swell- 
ing numbers of people gaining science and 
engineering PhDs: a 58% increase from 
2006 to 2016. 

The percentage of female award recipi- 
ents jumped from 33% to 43% in 2011, and 
has remained at about this level. Despite 
this increase, a female scientist’s chance of 
winning one of these grants has declined 
slightly, from 21% in 2010 (compared 
with 24% for men) to 19% in 2016 (26% 
for men). A lower success rate for women 
has been found in other programmes in 
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China and internationally’. It is hard to 
pin this on discrimination, differences 
in application styles or other reasons. We 
did observe that the success rate of women 
applicants aged 36-40 is lower than that 
of younger women. Despite this, more 
women are now receiving these awards. 

Women’s representation on review panels 
went up by 45% between 2010 and 2017, 
to 13.3%. That is still low, but consistently 
higher than rates seen from 1986 to 2009, 
which fluctuated around 6%. 

No women have yet applied to extend 
their project terms 
for pregnancy 
or child rearing. 
However, we think 
that many would 
have applied for 
extensions had 
they known about 
the policy: in the 
2016 survey, 60% of female scientists indi- 
cated that they had never heard of it. 

Although these measures have had rela- 
tively little time to influence the scientific 
enterprise in China, more than 70% of 
female scientists polled expect that each 
policy will have a positive effect in their 
discipline. 

Men who responded to the poll are less 
enthusiastic. About 60% thought that rais- 
ing age limits for female applicants would 
have a positive effect on their field, as did 
53% for extending project terms for mater- 
nity leave. Only 39% of men thought that 
increasing the number of women on review 
panels or favouring female applicants when 
all else was equal would be good for their 
fields. 


WIDESPREAD PROBLEM 

Discrimination and bias towards women in 
the workplace in China, as elsewhere, is all 
too common. A 2015 survey conducted in 
Beijing found that 87% of female university 
students encountered gender discrimination 
in their job hunt. 

Even among scientists, the CASTED 
survey found bias and burdens that must 
affect women’s careers (see ‘Gendered atti- 
tudes’). More than 20% of men and around 
10% of women agreed with the statements 
“A man’s success is measured by his career, 
while a womans success is measured by her 
family” and “men make better project lead- 
ers”. For the second statement, 48% of men 
and 81% of women disagreed. (We did not 
ask inverted versions of the questions, such as 
whether women make better project leaders.) 

Women feel the effects of these attitudes. 
Thirty-two per cent of female scientists 
reported that they encountered employers in 
their first job search who wanted to recruit 
only men. Given that 84% of the women sur- 
veyed were aged 45 or under, we must assume 
that most of this pool had experienced 


Chemist Youyou Tu worked with pharmacologist Lou Zhicen (left) on traditional Chinese medicine in the 
1950s, when women’s participation in the workforce in China was encouraged and protected. 


discrimination in recent decades. 

Unequal responsibilities for child rearing, 
care for older people and other domestic 
labour also hinder women’s career advance- 
ment in China, as has been reported for the 
United States’. Among married scientists 
in our survey, 30% of women compared 
with just 6% of men reported doing most 
housework themselves. And 2% of female 
researchers and 18% of male scientists say 
that their spouse does most of the house- 
work. What's more, the gradual lifting of Chi- 
nas one-child policy from 2013 has placed 
more parental responsibilities on women. 

Women are less likely than men to change 
location to advance their career. One scholar 
explored why only 11.4% of Chinese recipi- 
ents of funds from a German programme 
for visiting researchers in 2011 were female, 
and concluded that the ‘price for mobility’ 
was much higher for women than for men, 
because of marriage and family’. 

Chinese society in general and the scien- 
tific community in particular are undergoing 
big transformations. The optimistic view 
from our perspective is that straightforward 
policy changes are helping. However, as a 
funding agency, the NSFC’s role is limited. 
It is up to institutions to make decisions 
in hiring, appraising and promotion. The 
next step would be for the rest of China’s 
research system to explicitly acknowledge 


that various barriers in science prevent 
women from enjoying a level playing field 
with men, and to take measures to eliminate 
the existing gender bias. m 
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A peacock and (below) a 
dodo from Francis Willughby’s 
1676 Ornithologiae libri tres. 


NATURAL HISTORY 


A prodigious 
namer of nature 


Elizabeth Yale relishes a biography of a seventeenth- 
century polymath with a notable gift for collaboration. 


virtuoso in a virtuosic age. Remem- 
bered for a pioneering study of bird 
classification, the seventeenth-century natu- 
ral historian pursued interests in entomol- 
ogy, botany, linguistics, games and chance, 
and the reform of biological classification. 
That he is not better known may be put 
down to his death at 36. In The Wonderful Mr 
Willughby, ornithologist Tim Birkhead brings 
his creative energies and contributions to life. 
Born in England in 1635, the only son 
in an established family of the gentry, 
Willughby inherited estates in Warwickshire 
and Nottinghamshire. At the University of 
Cambridge, where polite young men usually 
acquired a smattering of culture and influ- 
ential connections, he took a different path 
— into scientific discovery. He dived into the 


Fees Willughby was a wide-ranging 
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“new sciences’, reading the works of Galileo 
Galilei, Francis Bacon and René Descartes. 
And he took copious notes, organized by 
topic, in his commonplace book — the 
database of the era. 

A sociable man, Willughby found friends 
who spurred him on. As Birkhead relates, 
the most important was Trinity 
College fellow John Ray. Ray 
encouraged Willughby’s 
interests in mathematics 
and took him botanizing. It 
was during these jaunts that 
Willughby observed puzzling 
transformations in caterpil- 
lars that sparked entomological 
discoveries. In the late 1650s, the 
pair embarked on a programme 
of “chymical” experimentation. 


“Chymistry’, as prac- 
tised by Robert Boyle 
and other natural 
philosophers, was 
then evolving from 
medieval alchemy to 
modern chemistry. It 
sought the transmu- 


TM 
~ BRKHEAD 


Te 
Wondertyy 
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Willughby 


ey 


tation of base metals 
The Wonderful oo gold even as it was 
Mr Willughby: larnessed for applica- 
The First True tions such as weapons 
Ornithologist manufacture. Eventu- 
TIM BIRKHEAD ally, Willughby and 
Bloomsbury (2018) Ray criss-crossed 


England and Wales on 
their birding and plant-hunting expeditions. 
Around 1662, they set themselves an 
ambitious goal: observing, describing and 
classifying all species. They felt that both the 
literature and the nomenclature sowed con- 
fusion. Swiss naturalist Conrad Gessner’s 
History of Animals (1551-58), for instance, 
mixed ancient knowledge with observation. 
By contrast, Ray and Willughby grounded 
their system in precise anatomical descrip- 
tion, distinguishing between even closely 
related species. Beginning with British spe- 
cies and extending to mainland Europe, they 
established a taxonomy that would be built 
on by centuries of naturalists, including Carl 
Linnaeus in the mid-eighteenth century. 
Dividing birds into land and water fowl, they 
deployed attributes such as beak shape to 
create a branching classification key. 
Willughby thrived on collaboration, and 
used his wealth to enable it. In 1662, Ray 
resigned his college fellowship, rather than 
subscribe to the Act of Uniformity passed 
by Parliament to fortify Charles II’s newly 
restored monarchy. Willughby invited his 
mentor into his household. The next year, 
Willughby was elected an “original fellow” 
of the Royal Society, and he and Ray, with 
Ray’s students Philip Skippon and Nathaniel 
Bacon, ventured on a tour across Europe. 
They attended university lectures and 
visited cabinets of curiosity — troves of 
exotica where they handled a hornbill’s head 
and an elephant’ tail. They collected birds’ 
eggs and a book of paintings of birds and 
fish from Leonard Baldner, keeper of forests 
in Strasbourg, now part of France. In rented 
rooms, they dissected and drew fish from 
Venice markets, a servant often doing the 
dirty work. They visited the museum of six- 
teenth-century naturalist Ulisse Aldrovandi 
in Bologna and attended 
human dissections. Of this 
very Protestant crew, Wil- 
lughby alone braved the 
dusty roads of Catholic Spain, 
which he viewed as a forbid- 
ding closed society. 
After they returned to 
England in the mid-1660s, 
Ray stayed on at Willughby’s 
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estate as the latter married, had children and 
managed his lands. Birkhead gives a won- 
derful sense of the pair’s delight in nature, 
even as Willughby, never robust, began to 
have recurring fevers. Inspired by physi- 
cian William Harvey’s discovery of blood 
circulation, published in 1628, Willughby 
contemplated the movement of sap in trees 
years before the subject surfaced in the Royal 
Society’s journal, Philosophical Transactions. 
He was the first to classify insects by their 
metamorphoses, recognizing that a caterpil- 
lar, pupa and butterfly were life stages of one 
insect, not separate species. He asked astute 
questions, such as which birds survive win- 
ters by migrating. He observed the life cycle 
of a leaf-cutter bee, later named after him 
— Megachile willughbiella. He even wrote a 
study on games, from football to cards. 

Birkhead’s account is vividly textured, 
drawing from his collaborations with science 
historians. We follow Willughby from seabird 
nesting grounds on the Isle of Man to glass- 
making factories in Murano, Venice. Willugh- 
by’s letters and notebooks, full of his swift, 
impatient writing, tell how avidly he worked. 
The strangeness of his scientifically liminal 
century shines through, exemplified by an 
“insect” collected in Italy, a fake made froma 
moray eel’s jaws and a thorny plant. Birkhead 
tightens the links between Willughby’s work 
and modern biology, confirming that he and 
Ray identified some 90% of around 200 bird 
species often seen in England and Wales. 

As Birkhead emphasizes, the bond 
between the restless Willughby and the more 
restrained Ray was extraordinarily fruitful. 
Yet there were challenges, not least differences 
in social circumstances. Willughby was a gen- 
tleman, Ray a blacksmith’s son — disparities 
they finessed in life. That became more dif- 
ficult after Willughby’s death. In exchange 
for an annuity, the family expected Ray to 
educate Willughby’s children; he was reluc- 
tant. They also resented Ray’s control over 
Willughby’s posthumous legacy. They quar- 
relled over access to Willughby’s collections 
and papers as Ray produced The Ornithology 
(1676), The History of Fishes (1686) and The 
History of Insects (1710), based on his joint 
work with his friend. Subsequently, historians 
have struggled to divide the credit, sometimes 
favouring one man, sometimes the other. 

“This game of spot-the-genius is inappro- 
priate and unhelpful,” writes Birkhead. He 
invites us to seea scientific life well lived, rich 
with ideas, adventure and companionship — 
and, in Willughby’s profound collaboration 
with Ray, two very different personalities who 
saw further because they worked together. m 


Elizabeth Yale is a lecturer in the 
Department of History at the University 

of Iowa. She is the author, most recently, of 
Sociable Knowledge: Natural History and 
the Nation in Early Modern Britain. 
e-mail: elizabeth-yale@uiowa.edu 
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Books in brief 


Edge of Chaos 

Dambisa Moyo LITTLE, BROWN (2018) 

Does the “new normal” in many democracies, from high 
unemployment to political turmoil, make them poor models 

for sustainable growth? In this trenchant analysis, economist 
Dambisa Moyo explores that provocative question. She examines 
growth across the political spectrum, from China to the United States, 
and probes entangled challenges such as debt and protectionism. 
Unsurprisingly, she points to an urgent need for political reform. 
Her blueprint for that (including civics courses for the electorate) is 
ambitious, but, as she asserts, “All the easy choices are behind us”. 


Eye of the Shoal 

Helen Scales BLOOMSBURY SIGMA (2018) 

Marine biologist Helen Scales’s Spirals in Time (2015) opened up 

a whorled wonderland of marine molluscs. This gifted writer now 
deep-dives into piscine realms. Scales, whose research has spanned 
the South China Sea and Australia’s Ningaloo coral reef, weaves the 
history of ichthyology with explorations of adaptations, such as how 
glycoproteins act like ‘antifreeze’ in the blood, and why shoaling 
saves energy. Perhaps most beguiling are the hums and pops of fish 
‘calls’, which the creatures sense through the lateral line — a series 
of organs that effectively turn their bodies into giant ears. 


Now You’re Talking 

Trevor Cox BODLEY HEAD (2018) 

On average, humans utter 500 million words over a lifetime. And it’s 
a crazily complex process, as acoustic engineer Trevor Cox reveals 
in this intensive survey. Speaking involves “anatomical gymnastics” 
linked to multiple brain regions; hearing is a subtle decoding of 
tone, timbre and sense. Cox’s investigation sweeps from the putative 
protolanguage of human ancestor Homo heidelbergensis to the 
likelihood of creative algorithmic discourse. In between, he looks at 
the infant’s acquisition of language, the neuroscience of beatboxing 
(vocally mimicking percussion instruments) and much more. 


The Ghosts of Gombe 

Dale Peterson UNIVERSITY OF CALIFORNIA PRESS (2018) 

In July 1969, Ruth Davis — a volunteer at Jane Goodall’s 
chimpanzee research centre in Gombe, Tanzania — disappeared. 
Her body was found below a waterfall six days later. Goodall 
biographer Dale Peterson probes the tragedy and its convoluted 
context in forensic detail, casting back and forth from the centre’s 
primatological findings to the human stories of its researchers. 
Peterson’s engrossing, sometimes dizzyingly kaleidoscopic narrative 
is bookended by nuanced analyses of how Davis might have died, 
and the aftershocks that still rock those who knew her best. 


On Color 

David Scott Kastan and Stephen Farthing YALE UNIVERSITY PRESS (2018) 
Artistic innovator Paul Cézanne accurately noted that colour is a 
collaboration between mind and world. So remind literary scholar 
David Scott Kastan and artist Stephen Farthing in this vivid and 
erudite tour of a phenomenon that entwines microphysics and 
electromagnetics with human physiology and cognition. Their 
march through ten hues drives home why much of culture is 
deep-dyed in colour, from political affiliations (think the Greens, or 
lreland’s Orange Order) to blue notes in music, “uncanny microtonal 
slides and bends” expressive of emotional subtleties. Barbara Kiser 
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Einstein said that — didn’t he? 


As the physicist’s papers reach volume 15, Andrew Robinson sifts attributed quotes. 


eyond his towering contribution to 
Bons. Albert Einstein was an avid 

commentator on education, marriage, 
money, the nature of genius, music-making, 
politics and more. His insights were legion, as 
we are reminded by this month's publication 
of volume 15 in The Collected Papers of Albert 
Einstein. Even the website of the US Internal 
Revenue Service enshrines his words (as 
quoted by his accountant): “The hardest thing 
in the world to understand is the income tax.” 

“There appears to be a bottomless pit of 
quotable gems to be mined from Einstein’s 
enormous archives,’ notes Alice Calaprice, 
editor of The Ultimate Quotable Einstein 
(2011); one detects a hint of despair. Indeed, 
Einstein might be the most quoted scientist 
in history. The website Wikiquote has many 
more entries for him than for Aristotle, 
Galileo Galilei, Isaac Newton, Charles Darwin 
or Stephen Hawking, and even than Ein- 
stein’s opinionated contemporaries Winston 
Churchill and George Bernard Shaw. 

But how much of this superabundance 
actually emanated from the physicist? Take 
this: “Astrology is a science in itself and con- 
tains an illuminating body of knowledge. 
It taught me many things and I am greatly 
indebted to it” These lines, displayed by some 
astrology websites as Einstein's, were exposed 
as an obvious hoax by the magazine Skeptical 
Inquirer in 2007. The real source was the fore- 
word to a reissued book, Manuel dastrologie 
(1965), first published by Swiss-Canadian 
astrologer Werner Hirsig in 1950. Einstein’s 
only known comment on astrology is in a 
1943 letter to one Eugene Simon: 


I fully agree with you concerning the pseudo- 
science of astrology. The interesting point is 
that this kind of superstition is so tenacious 
that it could persist through so many centuries. 


Among the hundreds of quotes that 
Calaprice notes are misattributed to Einstein 
are many that are subtly debatable. Some are 
edited or paraphrased to sharpen or neaten 
the original. “Everything should be made as 
simple as possible, but no simpler” might, 
says Calaprice, be a compressed version of 
lines from a 1933 lecture by Einstein: “It can 
scarcely be denied that the supreme goal of 
all theory is to make the irreducible basic 
elements as simple and as few as possible 
without having to surrender the adequate rep- 
resentation of a single datum of experience.” 
More certain is the provenance of “The most 
incomprehensible thing about the Universe 
is that it is comprehensible”. That rewords a 
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Albert Einstein in Caputh, Germany, in 1929. 


passage in a 1936 article in the Journal of the 
Franklin Institute: “The eternal mystery of the 
world is its comprehensibility ... The fact that 
it is comprehensible is a miracle” 

Even “God does not play dice’, arguably 
Einstein's most famous quote, isn’t quite his 
words. It derives from a letter written in 
German in December 1926 to his friend 
and sparring partner, theoretical physicist 
Max Born. It is published in the new volume 
of Einstein’s papers, in which the editors 
comment on its “varying translations” since 
the 1920s. Theirs is: “Quantum mechan- 
ics ... delivers much, but does not really bring 
us any closer to the secret of the Old One. I, at 
any rate, am convinced that He does not play 
dice” Einstein does not use the word ‘God’ 
(Gott) here, but ‘the Old One (Der Alte). This 
signifies a “personification of nature’, notes 
physicist and Nobel laureate Leon Lederman 
(author of The God Particle, 1993). 

Einstein’s name has also been affixed 
since his death to quotes from elsewhere. 
“The definition of insanity is doing the same 
thing over and over and expecting different 
results,” for instance, was traced by Einstein 
archivist Barbara Wolff to US writer Rita 
Mae Brown's Sudden Death (1983). “Not 
everything that can be counted counts, and 
not everything that counts can be counted,” 
was penned by sociologist William Bruce 
Cameron in his Informal Sociology (1963). 

This cosmos of quotes — real, massaged 
and faked — speaks to Einstein's status. More 
than 60 years after his death, his fame remains 
paramount. I feel there are at least four 


reasons why we are still fascinated by him. 

One is that Einstein’s discoveries are 
elemental and existential, unifying concepts 
of space and time, mass and energy and 
forces. They shifted our picture of reality. 
And he made more than a stab at explain- 
ing them to the non-physicist. Hence his 
part-joking encapsulation of relativity to the 
hungry press in 1921, on his first visit to the 
United States: “It was formerly believed that if 
all material things disappeared out of the uni- 
verse, time and space would be left. Accord- 
ing to relativity theory, however, time and 
space disappear together with the things.” 

There is also widespread empathy for 
Einstein’s resilience in his long struggle for 
security. His performance at his German 
school was good, but far from brilliant; he 
disliked the school for its regimentation and 
eventually abandoned it. He failed to get an 
academic position after graduation from uni- 
versity, partly because he mocked his physics 
teachers. In 1901, although semi-starving, he 
recognized the value of not conforming. He 
wrote to his fiancée that “impudence’” was his 
“guardian angel”. It would guide him through- 
out his life. 

Einstein was also highly engaged politi- 
cally and socially, and often in the public eye. 
He supported the creation of a Jewish home 
in Palestine, helped to establish the Hebrew 
University of Jerusalem, and in 1952 was 
offered Israel’s presidency. Yet he had writ- 
ten in a speech in 1938: “My awareness of the 
essential nature of Judaism resists the idea of 
a Jewish state with borders, an army, anda 
measure of temporal power. In 1933, he had 
publicly opposed Nazi Germany, fleeing to 
the United States by way of Britain, under 
some risk of assassination. Despite encour- 
aging US president Franklin D. Roosevelt to 
build an atomic bomb in 1939, he was hor- 
rified by its use in 1945 in Japan. He spoke 
out against racial and ethnic discrimination 
in the United States. In the 1950s, he trench- 
antly criticized the hydrogen bomb and 
McCarthyism, and, right up to his death in 
1955, he was targeted for deportation asa 
Soviet agent by FBI director J. Edgar Hoover. 

Finally, there is Einsteins ineffable wit. It is 
encapsulated by this aphorism, composed for 
a friend in 1930 (really: ’'ve checked with the 
Einstein Archives in Jerusalem): “To punish 
me for my contempt of authority, Fate has 
made mean authority myself” m 


Andrew Robinson is the author of 
Einstein: A Hundred Years of Relativity. 
e-mail: andrew@andrew-robinson.org 
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Ethics review for AI 
surveillance studies 


Scientists who develop 
algorithms based on user data 
face a moral dilemma: if their 
work is subverted to manipulate 
democracy or to support 
oppressive regimes, they could 
become part of something they 
would not knowingly endorse. 

In common with other fields, 
research on artificial intelligence 
(AI) and machine learning needs 
to be subject to approval from 
institutional review boards and 
compliance with data protection. 
In my view, journals should 
demand this as a condition of 
publication. Data scientists in 
industry must also adhere to 
professional guidelines from 
organizations such as the IEEE 
(see go.nature.com/2vt6ngr). And 
research that combines academic 
and corporate interests should be 
disclosed, as in other fields. 

As surveillance is combined 
with intelligent forms of 
behaviour-change technology, a 
new social contract around data 
is needed (see also H. Shah Nature 
556, 7; 2018). Unlike the media 
companies, governments and. 
organizations that use surveillance 
data, those who are monitored do 
not benefit. Nor do they have any 
control over how their data are 
used (uninformed consent), as 
was poignantly illustrated by the 
Cambridge Analytica scandal (see 
Nature 555, 559-560; 2018). 
Rafael A. Calvo, Dorian Peters 
University of Sydney, Australia. 
rafael.calvo@sydney.edu.au 


Poor meta-analyses 
pollute the literature 


Meta-analysis of published data 
is important in evidence-based 
medicine. However, it is an 
experiment-free route to rapid 
publication and so is open to 
abuse. Extra vigilance by peer 
reviewers and journal editors is 
called for to prevent redundant 
and conflicted meta-analyses 
from corrupting the literature. 
China produced 63% of meta- 
analyses of genetic associations 


in 2014, and most of those results 
are misleading (J. P. A. Ioannidis 
Milbank Q. 94, 485-514; 2016). 
Pressure to publish may be 
responsible, given that doing 
actual experiments takes 

much longer and can yield 
insufficient clinical data. And 
skilful presentation is often all it 
takes to disguise a poor-quality 
meta-analysis. 

Conclusions from arbitrarily 
merging results of variable quality 
will not resolve problems and 
should not guide clinical practice. 
A rigorous meta-analysis requires 
meticulous evaluation of the 
literature. And even high-quality 
meta-analyses in leading journals 
still need constant clinical testing 
to ensure that current guidelines 
for treatment remain valid. 

Yong Fan* Hong-Hui Hospital, 
Medical College of Xian Jiaotong 
University, Xian, China. 
wqnspine@163.com 

*On behalf of 4 correspondents (see 
go.nature.com/2hxxxcj for full list). 


Count the costs of 
sea-bed mining 


The International Seabed 
Authority (ISA) is negotiating a 
mining code to allow commercial 
deep-sea mining of minerals to 
start worldwide. At Greenpeace, 
we argue that we should instead 
be developing a sustainable 
circular economy that reduces the 
use of virgin materials. 

There is a huge demand for 
minerals in the computing, 
renewable-energy and mobility 
sectors. So far, the ISA has 
approved 29 exploration contracts 
in the Pacific, Indian and Atlantic 
oceans. Next year, the Canadian 
company Nautilus Minerals plans 
to mine copper, zinc and gold at 
depths of 1,500-2,000 metres in 
waters off Papua New Guinea. 

In our view, the ISA should take 
more account of the biological 
and ecological impact of these 
mining activities. It conspicuously 
lacks an environmental 
committee, for example. Proper 
oversight is crucial, because 
sea-bed mining risks wiping out 
pristine habitat and potentially 


unknown species. 
Mining-induced loss of 
biodiversity in the deep sea is 
likely to last forever on human 
timescales, given the slow natural 
rates of recovery in affected 
ecosystems (C. L. van Dover et al. 
Nature Geosci. 10, 464-465; 2017). 
We should instead be recycling 
the valuable materials contained 
in the 90% or so of the world’s 
electronic waste that is currently 
illegally traded or dumped (see 
also go.nature.com/2toh2vr). 
Sebastian Losada Greenpeace 
International, A Coruna, Spain. 
Pierre Terras Greenpeace 
International, Ar Bonou, France. 
pierre. terras@greenpeace.org 


Use SDGs to guide 
climate action 


The United Nations Agenda 

for Sustainable Development 
commits all countries to attaining 
17 goals (SDGs) and 169 targets 
by 2030, including SDG13’s action 
to combat climate change and its 
impacts (go.nature.com/2rlwf72). 
Notwithstanding this goal’s long- 
term benefits and synergies across 
other SDGs, climate action could 
have trade-offs with several of the 
SDG targets (see also M. Nilsson 
et al. Nature 534, 320-322; 2016). 
We suggest that the SDGs should 
be used as reference points to 
map relationships between 
climate action and sustainable 
development. 

For example, climate- 
mitigation policies in carbon- 
intensive and energy-exporting 
countries could slow economic 
growth (counter to target 8.1) 
or impair industrialization 
(target 9.2) in some sectors while 
boosting others. For end uses 
of energy alone, an estimated 
US$3.5 trillion needs to be 
invested annually from 2016 
to 2050 to adhere to a warming 
trajectory well below 2°C 
(go.nature.com/2jpmtbs). 

Climate policies can also 
be socially and economically 
regressive, exacerbating 
inequality and poverty (targets 
1.1 and 1.2) through impacts on 
land and food prices (target 1.4) 


and putting smallholders at risk 
(target 2.3). And some national 
climate-adaptation programmes 
have been linked with violent 
conflict (B. K. Sovacool World 
Dev. 102, 183-194; 2018). 
Effective policy on climate 
action and sustainable 
development requires researchers 
and decision-makers to be 
mindful of such trade-offs and of 
how they could risk undermining 
the social and political support 
needed for climate action. 
Francesco Fuso Nerini* KTH 
Royal Institute of Technology, 
Stockholm, Sweden. 
francesco.fusonerini@energy.kth.se 
*On behalf of 10 correspondents (see 
go.nature.com/2khyt96 for full list). 


Blobel’s Nobel — 
why so slow? 


Goran Hansson’s assurance 

that Nobel prizes continue to 
recognize the potential impact of 
a discovery is undermined by the 
lengthy interval before the award 
is won (Nature 556, 31; 2018). 
Cell biologist Ginter Blobel is 

a case in point (see S. Simon 
Nature 556, 32; 2018). 

Iattended a research seminar 
by Blobel in the late 1970s as an 
undergraduate biochemist. Three 
years or so after Blobel published 
the papers Simon mentions, 
our lecturers all acknowledged 
the central importance of that 
work. The Nobel committee only 
caught up 20 years later, finally 
recognizing what was by then in 
undergraduate textbooks, when 
Blobel was in his 60s. 

This interval is typical (see 
S. Fortunato et al. Nature 508, 186; 
2014), with prize recipients often 
past retirement age — hardlya 
reward for emerging excellence. 

Academic recruitment 
committees, too, tend to favour 
proven success over promising 
talent. Today, the postdoc who 
faced failure after failure in 
validating a brilliant intuition 
would not be rewarded with 
continued support as Blobel was. 
William Bains Rufus Scientific, 
Melbourn, Royston, UK. 
william @rufus-scientific.com 
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Faulty replication can sting 


Inappropriate cellular inflammation can cause disease. It emerges that the protein SAMHD!1 prevents the release of newly 
replicated DNA from the nucleus, blocking an undesirable pro-inflammatory response. SEE ARTICLE P.57 


MADZIA P. CROSSLEY 
& KARLENE A. CIMPRICH 


( vow need to distinguish between their 
own DNA and that of viruses. To solve 
this problem, plant, fungal and ani- 

mal cells store their DNA in the nucleus, and 
respond to DNA in the cytoplasm by activat- 
ing an inflammatory response. However, under 
certain abnormal circumstances, host-cell 
DNA can also accumulate in the cytoplasm, 
triggering inappropriate inflammation. On 
page 57 , Coquel et al.’ define a role for the pro- 
tein SAMHD1 in preventing this cytoplasmic 
host-DNA build-up. The authors’ findings have 
implications for immune disease and cancer. 

SAMHD1 is a nuclear protein that chemi- 
cally inactivates nucleotides’, preventing them 
from being used to build DNA. Inherited 
SAMHDI1 mutations cause the rare inflam- 
matory disorder Aicardi-Goutiéres syn- 
drome, which involves increased production 
of proteins called interferons that activate the 
immune system. Under normal circumstances, 
interferons are produced only in response to 
infection, and help to fight off viruses’. 

Aicardi-Goutiéres syndrome can also be 
caused by mutations in the enzymes RNase 
H2 and TREX1, which degrade specific nucleic 
acids®. In TREX1-mutant cells, single-stranded 
DNA (ssDNA) fragments accumulate in the 
cytoplasm and activate a DNA-sensing path- 
way, dubbed cGAS-STING, triggering inter- 
feron production’. Before the current study, 
the cause of chronic inflammation in cells har- 
bouring SAMHD1 mutations was unknown. 
Coquel et al. found that, as in TREX1-mutant 
cells, ssDNA accumulates in the cytoplasm of 
SAMHD1-deficient human cells. This leads 
to interferon production, mediated by the 
cGAS-STING pathway. 

Why does SAMHD1 deficiency cause this 
defect? Dynamic structures called replication 
forks form at sites where double-stranded 
DNA is unwound, enabling each strand to 
be duplicated during DNA replication. The 
progression of forks along DNA can slow or 
stall if unusual DNA structures or damaged 
DNA block their paths‘, or if nucleotide levels 
become depleted (Fig. 1a). Several proteins 
then cooperate to circumvent or repair the 
damaged site and restart replication. In some 
cases, degradation of newly synthesized DNA 
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Figure 1 |SAMHD1 at DNA-replication forks. a, During DNA replication, DNA is unwound and 
duplicated to produce newly synthesized DNA at dynamic structures called replication forks. Forks can 
stall during replication if they hit unusual DNA structures (not shown) or DNA that has been damaged, for 
example by ultraviolet (UV) light. b, Coquel et al.' report that the protein SAMHD1 directs the nuclease 
enzyme MRE11 to degrade newly synthesized DNA — a process called fork resection, which is crucial for 
overcoming stalling. c, In SAMHD1-deficient cells, MRE11 activity is lacking, and the newly synthesized 
DNA at stalled forks is processed by alternative proteins, resulting in the release of single-stranded DNA 
(ssDNA) into the cytoplasm. The ssDNA fragments accumulate and trigger pro-inflammatory responses. 


at stalled forks (a process called fork resection) 
promotes DNA remodelling or activation of 
repair pathways that help to restart DNA rep- 
lication. When the authors artificially induced 
replication-fork stalling in SAMHD1- 
deficient cells, cytoplasmic ssDNA levels and 
interferon responses increased. Moreover, 
the group demonstrated that the cytoplasmic 
ssDNA fragments were newly replicated, 
implying that they had been released from 
the stalled fork. Thus, SAMHD1 normally 
blocks cytoplasmic ssDNA accumulation by 
preventing the release of ssDNA from stalled 
replication forks. 

Previous work’, confirmed by the current 
study, has shown that SAMHD1 regulates 
nucleotide levels so that replication forks can 
progress efficiently across the genome. But 
Coquel et al. found that SAMHD1 also has a 
second role in fork progression and process- 
ing — directly binding to and activating the 
nuclease enzyme MRE11, which degrades 
nascent DNA during fork resection® (Fig. 1b). 
The authors demonstrate that it is this activ- 
ity of SAMHD1 that prevents newly replicated 
DNA from accumulating in the cytoplasm. 
When cells lack SAMHD1, meaning that 
MRE11 is either absent from stalled forks 


or enzymatically inactive, other enzymes 
aberrantly process the nascent DNA, producing 
fragments that move to the cytoplasm (Fig. Ic). 
Several key steps in this previously 
undescribed SAMHD1 pathway require fur- 
ther investigation. For example, how does 
SAMHD1 bind to forks and activate MRE11? 
What part of the replication fork is processed 
by SAMHD1 and MRE11 during resection? 
Of particular interest is the interplay between 
SAMHDI1, MRE11 and other molecules that 
control fork resection, such as proteins encoded 
by the BRCA1 and BRCA2 genes, the loss of 
which promotes certain cancers. BRCA-defi- 
cient cells exhibit uncontrolled fork resection 
by MRE11, resulting in over-digested DNA’®, 
which could compromise the integrity of the 
genome and contribute to cancer development. 
However, this feature of BRCA-deficient cells 
can also make them sensitive to chemother- 
apy’. This treatment induces DNA damage 
and increases replication-fork stalling — dur- 
ing chemotherapy, BRCA-deficient cells might 
therefore be selectively killed, whereas nor- 
mal cells are spared. The level of SAMHD1 in 
BRCA-deficient cells might be an indicator of 
patient responses to chemotherapy, with high 
levels increasing the activity of MRE11 and 
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exacerbating replication defects in tumour 
cells undergoing chemotherapy, and low levels 
indicating treatment resistance. 

SAMHD1 is often mutated in leukaemias 
and solid tumours”””. Although altered nucleo- 
tide levels might well perturb DNA replication 
and contribute to tumour development when 
SAMHD1 is mutated, Coquel and colleagues’ 
research provides an alternative explanation. 
Fork-resection defects in SAMHD 1-deficient 
cells might lead to increased problems with 
DNA replication — a phenomenon central to 
cancer development**. 

It remains unclear whether cGAS-STING 
activity, such as that induced by SAMHD1 
deficiency, promotes or prevents tumour 
formation. On one hand, increased levels of 
cytoplasmic nucleic acids and cGAS-STING 
activation can signal potentially danger- 
ous replication problems in abnormal cells, 
and thereby promote their elimination by 
the immune system’. Indeed, cGAS-STING 
immune signalling is suppressed in some 
cancers. On the other hand, persistent cGAS- 
STING signalling can lead to a chronic pro- 
inflammatory response, which can promote 
tumour development and spread'’”. Further 
work is required to better understand the role 
of these immune responses in cancer. 

Coquel et al. also advance our understanding 
of the causes of immune disease by shedding 
light on the important question of whether the 
different mutations associated with Aicardi- 
Goutiéres syndrome promote disease through 
a common mechanism. Although cells lack- 
ing SAMHD1, TREX1 and RNase H2 all 
trigger interferon responses through cytoplas- 
mic cGAS-STING signalling, there is some 
evidence that the activators of this pathway 
might be distinct. For SAMHD1 and TREX1, 
there are now links to ssDNA produced during 
DNA replication” , but a role for other cyto- 
plasmic nucleic acids is yet to be ruled out. For 
RNase H2, cGAS-STING is induced by DNA 
derived from micronuclei'* — small, aberrant 
nuclei that form when chromosomes fail to 
segregate properly into sister cells during cell 
division. Although double-stranded DNA can 
activate CGAS-STING, ssDNA might also be 
present in micronuclei and thus contribute to 
activation of the pathway. It will be interesting 
to further define exactly which nucleic acids 
drive this syndrome. 

Finally, it remains unclear how nucleic acids 
are released into the cytoplasm to activate the 
cGAS-STING pathway. One possibility is that 
they escape the nucleus after the surround- 
ing nuclear envelope breaks down during cell 
division’. However, Coquel et al. found that 
cytoplasmic ssDNA accumulates rapidly in 
SAMHD1-deficient cells, even before division, 
suggesting that other pathways are involved. 
Understanding how the pathological build-up 
of nucleic acids in the cytoplasm of cells occurs 
might help us to identify molecular targets 
that have the potential to be therapeutically 
manipulated in immune disease. m 
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Helium discovered in 
the tail of an exoplanet 


As the exoplanet WASP-107b orbits its host star, its atmosphere escapes to forma 
comet-like tail. Helium atoms detected in the escaping gases give astronomers a 
powerful tool for investigating exoplanetary atmospheres. SEE LETTER P.68 


DRAKE DEMING 


elium is ubiquitous in the Universe. 
H Large amounts were generated in 

the Big Bang’, and nearly every star 
begins its life by producing helium in its core 
through the nuclear fusion of hydrogen. The 
atmospheres of giant exoplanets are expected 
to have an abundance of helium’, because these 
planets formed from recycled gas and dust 
froma previous generation of stars. However, 
searches for helium in such atmospheres have 
been unsuccessful’. On page 68, Spake et al.* 


Helium atoms 


Gaseous tail 


report the discovery of helium atoms in the 
eroding atmosphere of the giant exoplanet 
WASP-107b. Their work opens a new chapter 
in the study of exoplanetary atmospheres. 
WASP-107b is of comparable size to Jupiter, 
but has about one-eighth the mass. The exo- 
planet’s low mass relative to its substantial 
size makes it difficult for the planet to retain 
its atmosphere — especially in the presence of 
strong ultraviolet radiation from its host star. 
Although this star is smaller and cooler than 
the Sun, it is threaded with magnetic fields 
produced by the star. Contortions of these 


WASP-107b 


Figure 1 | The escaping atmosphere of WASP-107b. As the giant exoplanet WASP-107b orbits its host 
star, ultraviolet radiation from the star energizes the planet’s atmosphere. Spake et al.’ show that this 
causes the atmosphere to escape, and to form a gaseous tail. The authors detected helium atoms in the 
escaping gases. This is the first time helium has been identified in an exoplanetary atmosphere. 
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fields emit ultraviolet radiation that energizes 
the planet’s atmosphere. 

Spake et al. observed WASP-107b using a 
camera on board the Hubble Space Telescope, 
and concluded that the planet’s atmosphere 
escapes to form a comet-like tail (Fig. 1). 
Astronomers have long known that giant plan- 
ets can lose their atmospheres in this fashion’, 
so this aspect of Spake and colleagues’ work is 
not surprising. But the authors have added a 
key twist to the story. Until now, only hydro- 
gen (the main component of giant planets) 
and a few elements with low abundances® 
have been identified in eroding exoplanetary 
atmospheres. 

Atoms in the gaseous tail ofan exoplanet are 
most easily detected when they absorb stellar 
light during a transit — a passage of the planet 
in front ofits host star. However, atoms in such 
a tenuous tail have a tendency to relax to their 
lowest-energy (ground) state. In this state, 
most atoms absorb mainly ultraviolet light, 
and measuring such absorption is difficult for 
two reasons. 

First, Earth’s atmosphere is opaque to most 
ultraviolet light, which means that absorp- 
tion measurements must be made from space. 
Currently, only Hubble has the capability for 
ultraviolet studies of exoplanetary atmos- 
pheres, and this telescope could reach the 
end of its mission lifetime in the next decade. 
Second, the pattern of how much ultraviolet 
stellar light is absorbed by transiting planets 
as a function of time or wavelength tends to 
be complex. Such complexity makes it difficult 
to interpret ultraviolet measurements of a 
transiting planet’s atmosphere. 

Fortunately, helium atoms have a long-lived 
(metastable) state, in addition to the ground 
state. Metastable helium atoms absorb near- 
infrared stellar light, which has a wavelength 
only slightly beyond the limits of human 
vision. Measurements at this wavelength are 
much easier to interpret than those at ultra- 
violet wavelengths. 

Spake and colleagues observed a transit of 
WASP-107b, and measured the amount of 
near-infrared stellar light that was transmitted 
through the planet’s eroding atmosphere as a 
function of wavelength. The authors identified 
a narrow absorption feature that they associ- 
ated with metastable helium atoms (see Fig. 1 
of the paper*). This signal is more than five 
times greater than any false signal that could 
be produced by stellar activity. 

Detecting helium in the escaping atmos- 
pheres of other exoplanets will be difficult 
because the absorption signal is intrinsically 
weak, especially for planets smaller than 
WASP-107b. However, astronomers will 
eagerly rise to the challenge. The near-infrared 
signature of metastable helium is readily trans- 
mitted through Earth’s atmosphere, which 
means that eroding exoplanetary atmos- 
pheres could be probed using ground-based 
telescopes. The advent of a new generation of 
extremely large telescopes at ground-based 
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observatories’ will allow astronomers to study 
the escaping atmospheres of planets as small 
as Neptune, which has a radius four times that 
of Earth. 

Theorists have predicted that the atmos- 
pheres of Neptune-sized exoplanets could be 
rich in helium’, owing to differences in the 
rates at which hydrogen and helium are lost 
to space. Like other giant planets, these bodies 
are thought to start out with atmospheres of 
predominantly hydrogen, abundant helium 
and smaller amounts of elements heavier 
than helium. As their atmospheres escape, 
hydrogen is lost fastest, leading to a gradual 
relative enrichment in the helium content of 
the atmosphere. 

Heavier elements such as carbon and oxygen 
would be slow to escape, and could in prin- 
ciple be present in exoplanetary atmospheres 
in concentrated amounts. These heavier ele- 
ments are key to understanding both how 
planets form and how they acquire their 
atmospheres. For planetary astronomers, 
an escaping atmosphere that is rich in heavy 
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elements is something of a cosmic treasure, 
providing ample scientific opportunities to 
study planetary formation and evolution. 
Spake and colleagues’ detection of helium in 
WASP- 107b will enable astronomers to look 
for atmospheres that are rich in helium, and 
perhaps in heavier elements, thereby opening 
anew subfield of exoplanetary science. m 
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An WwW 


Evolutionary insights 
from an ancient bird 


Ichthyornis dispar is a key extinct bird species from when birds were shedding 
characteristics of their dinosaur ancestors and evolving their current features. A 
reconstructed skull of I. dispar now illuminates this transition. SEE LETTER P.96 


KEVIN PADIAN 


he distinctive features of birds, from 

beaks to feathers, provide a stark 

separation between avians and other 
animal groups. But how did the features of 
the bird skull evolve? On page 96, Field et al. 
present a computerized reconstruction of the 
skull of a pivotal early bird that brings avian 
evolution into sharper focus. 

In the late 1800s, the palaeontologist Othniel 
C. Marsh and his field crews made many of the 
first reported discoveries of ancient dinosaurs 
and mammals from western North America, 
amassing a fossilized ‘bestiary’ that dwarfed 
what was then known from Europe’. Marsh’s 
treasures were constantly in the headlines, per- 
haps never more so than when he published’ 
his monograph Odontornithes in 1880, which 
reported several previously undescribed fossil 
birds of the mid-Cretaceous period (around 
80 million to 87 million years ago) from the 
shores of Kansas and nearby states. Familiar 
yet strange in many ways, these creatures were 
so archaic that they retained teeth and sub- 
stantial bony tails, thus providing clues to the 
reptilian origin of birds. When Charles Darwin 


received a copy of the monograph from Marsh, 
the letter that he wrote back to Marsh said: 
“Your work on these old birds and on the many 
fossil animals of N. America has afforded the 
best support to the theory of evolution, which 
has appeared within the last 20 years” (see 
go.nature.com/2hhjxrd). 

The specimens Marsh presented in 
Odontornithes were predominantly from 
two contrasting bird genera: Hesperornis, 
which was flightless and essentially wingless, 
standing 1.3-1.8 metres tall and comparable 
to today’s loons, and a tern-like bird called 
Ichthyornis, which had an average wingspan 
of about 60 centimetres (ref. 3). However, 
neither was closely related to living loons or 
terns. Both birds had many sharp, curved 
teeth, which were absent only from the front 
part of the upper jaw, and their beaks were 
covered by a horny sheath. Unfortunately, the 
excavated bones, being small, fragile and ofan 
elaborate architecture, were badly crushed, and 
proved challenging to prepare. The restoration, 
mounting and illustration of the specimens 
were, shall we say, somewhat overenthusi- 
astic. The specimens could be convincingly 
described only after the mounts had been 
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disassembled and prepared afresh more than 
acentury later’. 

Fast forward to the twenty-first century, 
and in the past 20 years some of the most sen- 
sational dinosaur discoveries have been the 
seemingly endless reports of ‘feathered’ dino- 
saurs and newly identified early birds, mainly 
from Cretaceous deposits in China. These 
specimens are closer to Archaeopteryx (the 
earliest known bird, from the Late Jurassic of 
Germany about 145 million years ago) than to 
Hesperornis and Ichthyornis’. These discover- 
ies have shown that the evolution of feathers, 
from hair-like down to flight feathers, broadly 
paralleled the sequence of development of the 
features of a single feather in living birds®. Such 
insights suggest a plausible sequence for the 
evolution of wings and flight in birds, whereby 
newly hatched ancient dinosaurs flapped their 
incipient wings as a way of boosting their 
ability to scale steep inclines when evading 
predators’. 

But many questions persist about the 
anatomical changes in early bird evolution, and 
this is where the work of Field and colleagues 
comes in. Present-day birds have skulls that are 
different in many ways from those of all other 
animals, including the dinosaurs from which 
they evolved. Bird snouts are lightweight, usu- 
ally narrow and sometimes quite long. Indeed, 
the bones of the bird snout are relatively light 
and fragile compared with those of other ani- 
mals, and these structures are covered by a 
strong beak made of the protein keratin, which 
enables birds to access various foods, such as 
seeds or carcasses. Inside the beak is a complex 
of bones that corresponds to the human palate; 
but unlike ours, the bird bones have mobile 
connections to each other and to the surround- 
ing skull and jaw bones. This system of mobil- 
ity is an elaboration of the basic dinosaurian 
one, and is key to accommodating the diverse 
feeding habits of birds. 

Moreover, ‘bird brain’ is not the insult you 
might think. Bird brains are larger relative to 
their body size than is the case for reptiles, and 
the relative size of bird brains is comparable to 
that of placental mammals. As birds evolved 
from their dinosaur ancestors, the bones that 
protect the brain enlarged to keep pace with 
the changes in brain size. The bones of the skull 
roof and cheek region are also comparatively 
larger than the equivalent structures in their 
dinosaur ancestors, whereas the adductor 
muscles of the bird jaw are reduced. But in 
what order did these features evolve, and how 
did they shape avian evolution? 

Ichthyornis is closely related to living 
birds, but retains many features of the 
earliest birds. No Ichthyornis skull material had 
been uncovered since Marsh's discoveries in the 
1870s. But Field et al. describe four new three- 
dimensionally preserved specimens with skull 
remains, and they image them in 3D, along 
with some overlooked skull bones from Marsh's 
original specimens. The authors used a stand- 
ard technique called high-resolution computed 
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Figure 1 | Skull of the bird Ichthyornis dispar. Field et al.' report the reconstruction of the skull of 

an extinct species. Their reconstruction fills in some structures missing from previously available 
fossils, thereby illuminating the transition between the loss of ancient dinosaur features and the 
evolution of characteristics found in present-day birds. The sections in yellow are newly identified 
fossil material, whereas the grey structures have been described previously. a, A side view of the skull. 
b, A view from above the skull (beak positioned on the left) showing cross-sections in two focal planes. 
The section above the black line is closer to the top of the skull than the region below the black line. 
Scale bar, 1 centimetre. (Adapted from Extended Data Fig. 2 of ref. 1.) 


tomography, in which a reconstruction of each 
bone is compiled by taking extremely thin 
cross-sectional images all the way through 
the bone, like slicing a salami sausage and 
reassembling it. This enables the internal 
anatomy and outer shape to be visualized. The 
images ofall the separate bones are assembled, 
and a computer program enables the bone 
images to be manipulated, allowing analysis of 
how the bones might have moved. 

The resulting skull images (Fig. 1) show that 
the beak of Ichthyornis has some features that 
place it between the earliest birds and living 
birds: the beak was small, had not yet evolved a 
bony shelf structure in the palate and was lim- 
ited to the tip of the jaw. However, the probable 
mobility of the Ichthyornis skull seems to be 
more like that of living birds. The brain would 
have been much like those of today’s birds, but 
the cheek region, bounded by bones of the 
skull roof and the side of the skull, has charac- 
teristics that are closer to those of dinosaurs, 
such as the retention of a large bony chamber 
for the adductor muscles that close the jaw. 
Therefore, several key features of the brain and 
palate evolved before the jaw muscles became 
reduced and the familiar features of the beak 
of living birds evolved. 

This study raises many questions that 
remain to be answered. For example, were 
there functional changes that went along with 
reducing the jaw muscles from the ancestral 
dinosaurian condition? Did this change reflect 
a change in diet? And what ecological habits 
are correlated with the loss of teeth from the 
front part of the upper jaw and the evolution 
of the horny beak that covers it? Hesperornis 
was probably a diver that hunted fishes and 


invertebrates in the water column, whereas 
Ichthyornis seems to have been more a surface 
skimmer or perhaps a shallow plunger like a 
tern or gull’*. How did these different preda- 
tory approaches favour the same pattern of 
tooth reduction, which also happened inde- 
pendently in other early bird groups? How did 
the mobility of the bones of the palate against 
adjacent skull bones in Ichthyornis compare 
with the ranges of motion in the palates of 
dinosaurs and living birds, and what might 
these evolutionary changes suggest about the 
diet and mode of feeding of Ichthyornis? 

Whatever the answers to these questions 
turn out to be, Field and colleagues’ beauti- 
fully rendered 3D scans and reconstructions 
of this iconic fossil avian, along with their 
comparisons of these structures with those of 
earlier and later birds, provide an important 
resource to aid our understanding of early bird 
evolution. = 
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SUPRAMOLECULAR CHEMISTRY 


A perfect marriage 


of materials 


An absorbent gel has been integrated into the void space of a protein crystal to 
yield a remarkable self-healing material — it recovers its molecular order after 
several cycles of expansion and contraction. SEE LETTER P.86 


FRANCOIS BANEYX 


attributes to the union that makes it both 
stronger and greater than the sum of its 
parts. On page 86, Zhang et al.' have achieved 
such a feat in the field of materials science. They 
have wed protein crystals to polymeric gels to 
create a self-healing hybrid material that can 
undergo multiple cycles of hyperexpansion and 
contraction without losing its ability to return 
to its original crystalline state. Such dynamic 
assemblies hold great promise for applica- 
tions such as sensors, separators and actuators 
(devices that convert energy into movement), 
but have thus far been difficult to produce. 
The ordered component of the wedded 
couple is ferritin, a protein that is ubiquitous 
in nature. Ferritin stores a mineralized form 
of iron called ferrihydrite in its hollow core, 
and releases iron(11) ions (Fe**) when they are 
needed by an organism. Each ferritin molecule 
is made up of 24 subunits that self-assemble 
to form a nearly spherical, cage-like structure. 
Zhang and colleagues used a variant of 
ferritin, which had been engineered as pre- 
viously reported’, to enable amino-acid side 


lE the best marriages, each partner brings 


Ferritin 


Hydrogel 


Expansion 
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chains on two ferritin molecules to bind to 
calcium ions, forming bridges between the 
molecules. Each ferritin molecule connects 
in this way to 12 of its neighbours, enabling 
the protein to grow as cubic crystals that 
have long-range order extending to tens of 
micrometres. 

The authors wanted to combine the crystal 
with a flexible component, and chose a poly- 
meric hydrogel — a water-absorbing network 
of crosslinked polymers — as a suitable candi- 
date. To bring the two components together, the 
authors first soaked the crystals in a solution 
of small hydrogel precursors; this allowed the 
precursors to diffuse throughout the water- 
filled void space of the ferritin lattice. They then 
polymerized the precursors by transferring the 
crystals to a solution of polymerization initia- 
tors that was highly saline to prevent undue 
crystal swelling by water infiltration. A contin- 
uous and elastic hydrogel network formed in 
less than two minutes in the void space, which 
was infiltrated soon after by the salt solution. 

When the resulting composites were placed 
in water, they expanded isotropically (equally 
in all directions) to about 200% of their original 
size within minutes. The crystals did not lose 


Figure 1 | Expansion and contraction of a protein-hydrogel composite. Zhang et al.' have prepared 
a composite material in which a hydrogel (a water-absorbing network of crosslinked polymers) fills the 
void spaces in the crystal lattice of the almost-spherical ferritin protein; the lattice is held together by 
calcium ions (not shown) that form bridges between ferritin molecules. When the crystals are soaked 
in water, they expand equally in all directions as the hydrogel hydrates. Most of the order in the lattice 
is lost, but some long-range order is retained in the diagonal direction shown (the (111) plane). When 
the expanded crystals are soaked in a solution of concentrated sodium chloride (which dehydrates the 
hydrogel) and then in a calcium chloride solution (which re-forms bridges between ferritin molecules), 
the crystals contract back to their original size, and lattice order is restored. 
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their faceted morphology, or release ferritin to 
the bulk medium, for at least 50 minutes. How- 
ever, continued expansion eventually led to the 
loss of detectable edges of the crystals. 

More strikingly, when Zhang and colleagues 
transferred partially expanded crystals to a con- 
centrated solution of either sodium chloride or 
potassium chloride, and then incubated them in 
a concentrated solution of calcium chloride, the 
crystals contracted through dehydration, recov- 
ering both their original size and their lattice 
structure (Fig. 1). This expansion-contraction 
process could be repeated at least eight times 
without a noticeable change in ferritin order- 
ing. In fact, the authors observed that X-ray 
structures of ferritin-hydrogel hybrids that 
had undergone a single expansion-contraction 
cycle were of higher resolution than could be 
achieved using conventionally prepared ferritin 
crystals that lacked the hydrogel. This suggests 
that the lattices of such hybrids are more pre- 
cisely ordered than those of conventionally 
produced ferritin. Polymer infusion might thus 
be a useful approach to improve the quality of 
other protein structures, or to access alternative 
structural states of proteins. 

Several additional observations are worth 
highlighting. First, Zhang et al. found that not 
all hydrogels are equally good at supporting 
the expansion-contraction process, because 
extensive electrostatic and hydrogen-bonding 
interactions are required between polymer 
side chains and ferritin molecules for order to 
be recovered. Second, the reversibility of the 
process is not perfect: calcium ions help the 
ferritin lattice to ‘snap back into place, both 
by screening undesirable electrostatic inter- 
actions and by bridging ferritin molecules, 
but the structure of about half of these bridges 
changes after an expansion-contraction cycle. 
Third, the polymer-infused crystals often 
cracked when subjected to abrupt expan- 
sion or contraction. However, these fractures 
were swiftly repaired because the polymeric 
network dynamically interacted with, and 
rearranged around, the ferritin molecules, 
even though the hydrogel was not engineered 
to have intrinsic self-healing properties’. 

Finally, the authors showed that neither 
crystals loaded with ferrihydrite, nor crystals 
made using fluorescently labelled ferritin, were 
impaired in their ability to undergo isotropic 
expansion-contraction when infused with 
hydrogel. They also demonstrated that the 
expansion and contraction of such crystals was 
unaffected when a shell of ferritin was grown 
on them and the assembly was infused with 
hydrogel. By contrast, when a ferritin shell was 
grown on a core of fluorescently labelled ferri- 
tin crystals whose molecules had been chemi- 
cally crosslinked to prevent expansion, and 
hydrogel was incorporated into the core-shell 
structure, the shell shattered on treatment of 
the crystals with water, owing to the genera- 
tion ofa lattice mismatch between the core and 
shell as the shell expanded. 

The internal cavity of ferritin has been used 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


as ananoscale reaction vessel in which to pro- 
duce mineralized nanoparticles that have opti- 
cal, catalytic or magnetic activity’, and packing 
nanoparticles into ordered assemblies leads to 
the emergence of collective behaviours that 
have found applications in opto-electronics, 
medicine and sensing’. Zhang and colleagues’ 
findings might therefore provide a useful 
method for fine-tuning such collective phe- 
nomena by enabling controllable and revers- 
ible structural ordering in 3D nanoparticle 
arrays. Success will hinge on the identity and 
properties of the particles produced in the 
cavities of ferritin molecules, on the separa- 
tion distances and ordering that the protein 
structure and crystal lattice impose on these 
nanoparticles, and on whether the kinetics of 
polymer expansion and contraction can be 
accelerated or otherwise precisely controlled. 
Beyond ferritin, the use of other natural 
protein cages’ — and, more excitingly, of 
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synthetic protein cages designed from scratch” 
— should provide the versatility needed to 
control nanoparticle separation distances and 
lattice systems. Furthermore, the develop- 
ment of creative polymer chemistry will aid 
efforts to modify the kinetics of expansion and 
contraction. m 
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NOW 


Molecular machines 


Swap rings 


A chemical system has been made in which two rings on an axle can switch 
places by allowing a smaller ring to slip through the cavity ofa larger one. The 
advance opens up potential applications in molecular data storage. 


STEVE GOLDUP 


any of the synthetic molecular 
machines’ that have been devel- 
oped in the past 40 years are based 


on rotaxanes: molecules in which a ring- 
shaped component encircles a linear axle that 
is terminated with large ‘stoppers’ to prevent 
the ring from slipping off. The threading of 
the axle through the ring limits the motion 
of the ring to shuttling back and forth along 
the axle. Such shuttling has been used ina 
range of molecular machines that includes 
switches’, ratchets’, pumps‘ and small- 
molecule synthesizers’. Rotaxanes in which 
more than one ring encircles the axle have 
also been made’, reminiscent of an abacus, but 
the rings have been unable to switch places. 
Writing in Nature Chemistry, Zhu et al.’ now 
reporta system in which the rings can slip past 
one another, opening the way to new types of 
molecular machine. 

To achieve a ring-through-ring shuttling 
motion, Zhu and colleagues assembled a 
rotaxane that contains two differently sized 
rings (Fig. 1). One has a circumference of 
24 atoms, which is about as small as a ring can be 
in a rotaxane, whereas the other is almost twice 
as large at 42 atoms. Both rings form hydrogen 
bonds with nitrogen—hydrogen (N—-H) units of 


the axle, and this enabled the authors to probe 
the rings’ movement using nuclear magnetic 
resonance (NMR) spectroscopy. 

At room temperature, the authors observed 
two distinct N—-H signals in the NMR spec- 
trum of the rotaxane, because the signal for 
an N-H unit that is bonded to the small ring 
appears at a different frequency from that of an 
N-H unit bonded to the larger ring. This told 
Zhu and co-workers that the rings exchange 
places slowly at this temperature, or not at all. 
However, as the sample of rotaxane was heated, 
the signals began to broaden and then merged 
into a single peak. This finding confirmed that 
the rings change places quickly at elevated 
temperatures. The only way that this could 
have occurred is by the smaller ring passing 
through the larger one. 

Zhu et al. determined that, at room tem- 
perature, the energy barrier that must be over- 
come for the rings to change places is about 
52 kilojoules per mole of rotaxane, which cor- 
responds to a shuttling rate of about 3,600 times 
per second. For comparison, in an analogous 
rotaxane that contains only the smaller ring, the 
ring hops between the N-H groups approxi- 
mately 80,000 times per second, or roughly 
20 times faster. On the basis of this compari- 
son, the authors estimate that the ring-through- 
ring movement costs’ about 12 kilojoules per 
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50 Years Ago 


The British General Post Office 

is busy organizing a “telephone 
fortnight” in an attempt to silence 
the public criticism of its services. 
So far, the promotion has given 
everybody a chance to tell their 
favourite telephone stories, most 

of them unflattering to the GPO. 
The GPO’s timing was inept; it is 
only two weeks since it announced 
increases in postal and telephone 
charges, and it might have been 
better to let the hubbub settle 

down before organizing the 
campaign ... In the next three years, 
the GPO is intending to spend 
£1,100 million on investment in 
telecommunications ... In the longer 
term, the GPO should be wondering 
how to increase the number of 
subscribers ... Britain still has very 
few telephones — 183 telephones 
per 1,000 of population. 

From Nature 4 May 1968 


100 Years Ago 


Students of animal behaviour 

will find some interesting facts 

on the “drumming” of the ruffed 
grouse ... in Forest and Stream 

for April, illustrated by a series of 
remarkable photographs, probably 
the first of the kind which have 
ever been taken. The author, 

Mr. E K. Vreeland, had the good 
fortune to watch at close range one 
of these birds while “displaying”, 
and he is convinced that the 
strange drumming sound then 
made is produced by the use of the 
wings alone. This may indeed be 
the case, but we suspect that later 
investigations will show that these 
sounds are at least partly vocal ... 
The author is apparently so much of 
an “outdoor naturalist” that he has 
never read any of the voluminous 
literature on this theme of courtship 
displays. But in some respects this 
adds rather than detracts from the 
value of his observations, since his 
records are made without bias. 
From Nature 2 May 1918 
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Figure 1 | Ring-through-ring shuttling of a rotaxane. a,b, Zhu et al.’ report a molecule known as a rotaxane that consists of two rings, one much larger than the 
other, threaded on to an axle. Large groups, known as stoppers, at the ends of the axle prevent the rings from slipping off (Ph, phenyl group). c, The authors find 
that the rings can slip past each other to exchange their positions on the axle. Shuttling takes place by means of the smaller ring passing through the larger one. 


mole — aconsiderable amount, but not as high 
as might have been expected. 

The two-ringed rotaxane is remarkably 
simple compared with some of the synthetic 
molecular machines that have been produced 
so far. It might therefore seem surprising that 
this is the first time that a ring-through-ring 
shuttling process has been observed. How- 
ever, to achieve their breakthrough, Zhu and 
colleagues had to bring together several key 
structural features. 

First, the dramatic difference in the size of 
the rings is important for enabling shuttling to 
occur. Indeed, the authors produced another 
rotaxane analogue in which the larger ring 
was 12 atoms smaller and found that no ring- 
through-ring shuttling occurs, even at elevated 
temperatures. 

Second, when large rings are used as 
components of rotaxanes, the stoppers on the 
ends of the axle must be extremely large to pre- 
vent the rings from slipping off. This demand 
can complicate the synthesis of rotaxanes 
because larger stoppers often cause problems 
with solubility, and their use typically adds 
further steps to an already complex synthetic 
route. Zhu et al. overcame these issues by using 
simple T-shaped stoppers that they had devel- 
oped previously to make porous materials 
(known as metal-organic frameworks) that 
incorporate rotaxanes*. 

Such structural issues highlight a limitation 
of the newly identified dynamic process: if 
rotaxane structures that show ring-through- 
ring shuttling must be so contrived, will it be 
possible to use ring-through-ring shuttling to 
develop molecular machines? It is to be hoped 
that the answer is ‘yes, because ring-through- 
ring shuttling could bring an extra dimension 
to rotaxane-based switches. A potential appli- 
cation suggested by the authors is molecular- 
level weaving, in which ring-through-ring 
shuttling controls the entanglement of 
molecular threads — essentially, a small ring is 
used to pull a molecular chain through a larger 
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ring, akin to threading a macroscopic needle. 

Multi-ring rotaxanes could potentially also 
be used for information storage, in which data 
are encoded by the order of the rings on the 
axle. Before now, there was no major advantage 
to this approach compared with storing data 
in simpler molecules, because the ring order 
was fixed at the time of synthesis®. Zhu and 
colleagues’ work opens up the possibility of 
using external stimuli to order and reorder the 
rings, and therefore of writing and rewriting 
any encoded information. = 
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Life of a liver awaiting 
transplantation 


People waiting for a liver transplant can die before an organ is found, or, if one is 
available but of poor quality, there is a risk of transplant failure. A machine that 
preserves livers might offer a way forward. SEE ARTICLE P.50 


STEFAN SCHNEEBERGER 


he standard approach for handling 

donated livers before transplantation 

is storing them on ice. On page 50, 
Nasralla et al.' report the results of a clinical 
trial that compared two organ storage meth- 
ods. More than 200 people who received a 
liver transplant were randomly allocated 
either a donor liver that had been stored on 
ice or one preserved with the aid of a machine 
that perfuses the organ at body temperature 


(37°C) with oxygenated blood containing 
nutrients (Fig. 1). The latter method is called 
normothermic machine perfusion (NMP), 
and this technique enables organ function 
to be monitored outside the body before 
transplantation. 

The concept of machine perfusion of an 
organ awaiting transplantation is not new. 
Indeed, machine-assisted perfusion was in 
use before cold storage became the method of 
choice owing to its simplicity and reproducibil- 
ity’. However, interest in revisiting perfusion 
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as a transplant approach has been gaining 
momentum. 

The main outcome monitored in the latest 
trial was the post-transplantation level of the 
enzyme aspartate transaminase in patients’ 
blood. This measurement is commonly used 
to assess liver damage and to estimate the risk 
of transplant failure. The authors found that 
the use of NMP was associated with less liver 
damage than that found in livers preserved on 
ice. Moreover, preservation by NMP reduced 
the number of organs that were discarded 
as unsuitable for transplantation compared 
with livers preserved on ice, and was associ- 
ated with a better blood-flow profile in the 
recipient. 

The liver’s bile ducts can be a point of 
vulnerability for transplant success, and 
whether NMP has a positive effect on the 
viability of these ducts might only be revealed 
after long-term monitoring. It could therefore 
be premature to assert that NMP technol- 
ogy has been shown to be a more effective 
and suitable method for organ storage until 
additional studies can fully determine the 
long-term effect of the NMP approach. But it 
is also fair to say that Nasralla and colleagues’ 
work makes a convincing case for NMP’s 
superiority. Moreover, mimicking the normal 
conditions for an organ outside the body is a 
persuasive idea. This clinical trial represents a 
milestone by directly comparing ice and NMP 
storage approaches. It could pave the way for 
the clinical application of NMP and drive a 
research push in this area. 

Organ storage on ice slows the liver’s 
metabolism, and can result in tissue damage 
by decreasing levels of the energy-storage 
molecule ATP. The alterations result in accu- 
mulation of harmful reactive oxygen species, 
damage to mitochondrial organelles, and 
trigger an inflammatory response when 
blood flow is restored to the transplanted 
organ’. NMP probably boosts transplant 
success through mimicking the normal 
conditions for the organ and enabling ATP 
replenishment. Such effects would eventually 
limit the generation of reactive oxygen species 
and cell damage. 

Although Nasralla and colleagues’ study 
is convincing regarding the clinical impact 
of NMP, it does not clarify the underlying 
molecular events. Factors including organ 
architecture and cellular composition will need 
to be explored in detail to determine the length 
of preservation times that can be safely used. 
Other clinical trials of machine-assisted organ 
perfusion are under way, testing a range of 
temperatures and conditions’. NMP has been 
performed successfully for 48 hours before 
transplantation in porcine livers’. The median 
duration of NMP in Nasralla and colleagues’ 
study was around 9 hours. 

Pioneering advances in surgical techniques, 
immunosuppressive drug treatment and 
patient care have made organ transplanta- 
tion the standard treatment for chronic organ 
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Figure 1 | Supporting the human liver outside the body. Nasralla et al.’ report a clinical trial that 
compared methods of liver storage before transplantation. a, In the current standard approach, livers are 
stored on ice in preservation fluid. However, this can lead to a decrease in levels of the energy-storage 
molecule ATP, an increase in harmful reactive oxygen species (ROS) and damage to mitochondrial cellular 
organelles when blood flow returns after transplantation’. b, In an approach termed normothermic 
machine perfusion (NMP), donated livers are maintained at body temperature (37°C) by a machine. 

This device pumps the liver’s deoxygenated blood (dark red) through the machine, and then returns 
oxygenated blood (light red) containing nutrients and essential factors such as bile salts, heparin and 
insulin to the liver. On the basis of analyses of enzyme levels (not shown), the authors found that NMP 
results in less liver damage than that incurred when livers are stored on ice. This might be because 
perfusion results in higher ATP levels and fewer ROS compared with ice storage. c, Machine-based organ 
perfusion could perhaps be adapted to repair liver damage before transplantation. Such an approach might 
require the addition of biological agents such as growth factors, or the introduction of stem cells. 


failure. Yet three fundamental limitations 
remain: organ shortages; a decline in the 
quality of donor organs®; and restrictions on 
the time permitted for handling and trans- 
porting organs. The use of NMP has the 
potential to increase the number of organs 
available for transplantation, and could 
lessen the need to rapidly transplant an organ 
after its removal from the donor’s body, 
enabling more time for assessing the liver. 

Nevertheless, NMP poses certain challenges. 
It might be straightforward to set up the 
machine, prepare the organ for perfusion 
and do basic technical problem-solving, but 
substantial training and experience in using 
this method will be required before it can be 
routinely used in the clinic. Furthermore, the 
travel and care plan for an NMP organ is more 
complex than that for shipment on ice because 
extra steps are needed in the clinical routine. 
Standardized protocols and reporting methods 
need to be established to enable this procedure 
to advance in a controlled fashion. The shar- 
ing of information by medical professionals 
using NMP could help clinicians gain the 
collective experience necessary to minimize 
procedural risks. 

Perhaps the greatest advance from this 
technology is that it might provide proof 
of principle that organs can survive when a 
machine helps to mimic the conditions organs 
encounter in the human body. The human liver 
itself then becomes like a patient. It is perfused, 
monitored and fed by a machine. Ifit performs 
well, it is transplanted immediately; if not, it 


might undergo treatment, modification or 
repair before transplantation. 

The routine clinical use of NMP might boost 
interest and investment in finding new ways 
to treat, regenerate and recreate organs that 
are supported outside the body. The ability to 
preserve an organ under close-to-normal con- 
ditions could be instrumental in advancing not 
only liver transplantation, but also liver sur- 
gery and organ care. Tissue engineering, like 
transplantation, works towards the goal of pro- 
viding human organs suitable for transplanta- 
tion. As NMP offers the potential to modify 
and monitor an organ, the methods used in 
both fields might start to merge. This idea 
is fascinating, because not only would this 
change many aspects of transplantation, but 
it might also eventually close the gap between 
the two technologies. m 
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Genomic variation in 3,010 diverse 
accessions of Asian cultivated rice 
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Here we analyse genetic variation, population structure and diversity among 3,010 diverse Asian cultivated rice (Oryza 
sativa L.) genomes from the 3,000 Rice Genomes Project. Our results are consistent with the five major groups previously 
recognized, but also suggest several unreported subpopulations that correlate with geographic location. We identified 29 
million single nucleotide polymorphisms, 2.4 million small indels and over 90,000 structural variations that contribute to 
within- and between-population variation. Using pan-genome analyses, we identified more than 10,000 novel full-length 
protein-coding genes and a high number of presence-absence variations. The complex patterns of introgression observed 
in domestication genes are consistent with multiple independent rice domestication events. The public availability of 
data from the 3,000 Rice Genomes Project provides a resource for rice genomics research and breeding. 


Asian cultivated rice is grown worldwide and comprises the staple food 
for half of the global population. It is envisaged that by the year 2035! 
feeding this growing population will necessitate that an additional 
112 million metric tons of rice be produced on a smaller area of land, 
using less water and under more fluctuating climatic conditions, which 
will require that future rice cultivars be higher yielding and resilient 
to multiple abiotic and biotic stresses. The foundation of the contin- 
ued improvement of rice cultivars is the rich genetic diversity within 
domesticated populations and wild relatives’~*. For over 2,000 years, 
two major types of O. sativa—O. sativa Xian group (here referred to as 
Xian/Indica (XI) and also known as 4il, Hsien or Indica) and O. sativa 
Geng Group (here referred to as Geng/Japonica (GJ) and also known 
as 48, Keng or Japonica)—have historically been recognized*’. Varied 
degrees of post-reproductive barriers exist between XI and GJ rice 
accessions’; this differentiation between XI and GJ rice types and the 
presence of different varietal groups are well-documented at isozyme 
and DNA levels’. Two other distinct groups have also been recognized 
using molecular markers"; one of these encompasses the Aus, Boro 
and Rayada ecotypes from Bangladesh and India (which we term the 
circum-Aus group (cA)) and the other comprises the famous Basmati 
and Sadri aromatic varieties (which we term the circum-Basmati group 
(cB)). 

Approximately 780,000 rice accessions are available in gene banks 
worldwide'!. To enable the more efficient use of these accessions 


in future rice improvement, the Chinese Academy of Agricultural 
Sciences, BGI-Shenzhen and International Rice Research Institute 
sequenced over 3,000 rice genomes (3K-RG) as part of the 3,000 Rice 
Genomes Project”. 

Here we present analyses of genetic variation in the 3K-RG that 
focus on important aspects of O. sativa diversity, single nucleotide 
polymorphisms (SNPs) and structural variation (deletions, duplica- 
tions, inversions and translocations). We also construct a species pan- 
genome consisting of ‘core’ genes that are present in all individuals 
and ‘distributed’ (variable, accessory or dispensable) genes that are 
absent in some individuals!**. The gene presence-absence variations 
(PAVs) represent another component of species genetic diversity. Our 
analyses provide new perspectives on rice intra-species diversity and 
evolutionary history. 


Genome mapping, size and SNP variation 

Baseline genome sequencing, analyses, and accession informa- 
tion and metadata for the 3,024 rice genomes are summarized in 
Supplementary Data 1 and Supplementary Notes. Fourteen acces- 
sions were excluded from further analyses after quality control. The 
remaining 3,010 genomes had an average mapping coverage of 92% 
(74.6-98.7%) (Supplementary Data 2 Table 1), when aligned to the 
O. sativa cv. Nipponbare IRGSP 1.0 reference genome!® (hereafter 
referred to as ‘Nipponbare RefSeq’). The estimated size of the genome was 
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Fig. 1 | Unweighted neighbour-joining tree based on 3,010 samples 
and computed on a simple matching distance matrix for filtered SNPs. 
Samples are coloured by their assignment to k=9 subpopulations from 
ADMIXTURE*. 


375.1 + 20.9 Mb, with 42.5 + 0.4% guanine-—cytosine content and 35.6 
+ 3.7% repetitive sequence content (Supplementary Data 3 Table 1). 

We identified over 29 million SNPs—27 million of which are bi- 
allelic—and found high concordance (>96%) with previous reports 
(Supplementary Notes)!®!”. Filtering reduced this to a ‘base SNP’ set 
of approximately 17 million SNPs, which captured >99.9% of all SNPs 
with minor allele frequencies (MAF) > 0.25% (Extended Data Fig. 1). 
Half (56%) of non-transposable element (NTE) genes and the majority 
(91%) of transposable element (TE)-related genes have high-effect 
SNPs (Supplementary Data 2 Tables 2-4). NTE genes contained about 
1.44 million moderate-to-high effect, and about 1.5 million low-effect, 
SNPs, which gave a ratio of 0.95 for moderate-to-high:low SNPs. For 
small indels, insertions affected 28% of NTE- and 50% of TE-related 
genes: deletions affected 41% of NTE- and 70% of TE-related genes. 
A typical genome in a major varietal group contains approximately 
2 million (XI and cA), 0.3-0.8 million (GJ; depending on the subpopu- 
lation) or about 1.2 million (cB) SNPs (Supplementary Data 2 Table 5). 
The SNPs of a typical genome were classified as 7.9% moderate-to-high 
effect and 5.1% low effect. 


Population structure and diversity 
The 3K-RG accessions were classified into nine subpopulations (Fig. 1 
and Extended Data Fig. 2a—d), most of which could be connected to 
geographic origins (Supplementary Data 1). There were four XI clusters 
(XI-1A from East Asia, XI-1B of modern varieties of diverse origins, 
XI-2 from South Asia and XI-3 from Southeast Asia); three GJ clusters 
(primarily East Asian temperate (named GJ-tmp), Southeast Asian 
subtropical (named GJ-sbtrp) and Southeast Asian tropical (named 
GJ-trp)); and single groups for the mostly South Asian cA and cB 
accessions. Accessions with admixture components <0.65 within 
XI and GJ were classified as “XI-adm and ‘GJ-adm, respectively, and 
accessions that fell between major groups were classified as admixed 
(Extended Data Fig. 2a). 

Distinct allele frequency profiles for SNPs of MAF > 10% occurred 
for the nine subpopulations with deviations from the neutral model 
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reflecting different adaptations and demographic events (Extended 
Data Fig. 3a). Larger numbers of ‘private’ alleles were found in 
cA and cB than in other subpopulations (Extended Data Fig. 2e). 
Comparatively, XI subpopulations have smaller numbers of private 
alleles, probably owing to ongoing gene flow from natural hybridi- 
zation and breeding. Doubleton sharing patterns within and between 
subpopulations showed the same trend (Extended Data Fig. 2f). 

Linkage disequilibrium decay rates for combined subpopulations 
were higher in XI than GJ, with little variation between the two GJ 
subpopulations, as previously reported”'*!8, However, for the nine sub- 
populations, linkage disequilibrium decay between XI subpopulations 
varied more markedly, with XI-2 and XI-3 exhibiting faster linkage 
disequilibrium decay than XI-1A and XI-1B (Extended Data Fig. 3b). 
Furthermore, linkage disequilibrium decay correlates strongly with 
nucleotide diversity (77) among the nine subpopulations (R? = 0.93, 
Pvalue=2.5 x 10~>) (Extended Data Fig. 3c). 

Nucleotide diversity computation identified many regions of low 
genetic diversity that contained small numbers of genes under selec- 
tive constraints (Extended Data Fig. 3d). Sh4’°, which controls non- 
shattering, showed an accordant profile of diversity reduction across 
all subpopulations (Fig. 2a) that indicates much longer selection, when 
compared to qSH1”°. At the semi-dwarf gene sd1*! locus, a narrow 
region of reduced diversity occurred in all major groups, which is 
a similar pattern to that observed for qsH1. However, higher diver- 
sity in the surrounding 100-kb regions occurred in the cA, cB and 
XI groups, whereas the GJ groups had extended regions of reduced 
diversity, which reflects the breeding history associated with the ‘green 
revolution’. Different patterns of diversity reduction were observed at 
other important loci. The Wx”? locus that affects amylose content and 
stickiness on cooking, the Badh2.1** locus that affects aroma and their 
surrounding regions are highly diverse in the XI, cA and cB groups, 
which indicates complex histories for selection for different types of 
eating quality; by contrast, both loci and their surrounding regions 
show low diversity in GJ. The Rc?® locus has very low diversity in all 
variety groups, with variable diversity in the surrounding regions in 
XI, cA and cB. 

We compared SNP variation among TE-related genes, NTE-related 
genes, 1,021 genes with validated functions curated in the OGRO/ 
QTARO database*®”’ and a subset of 78 domestication and agronomi- 
cally relevant genes (Supplementary Data 4). Genetic diversity was 
reduced significantly (P value < 107”) near OGRO-curated genes and 
was often more extreme across the 78-gene subset in each subpopu- 
lation (Fig. 2b) when compared with all genomic regions containing 
genes, which suggests there may have been selection for these genes. 


Structural variations 

Structural variations (SVs) were called for 3,010 accessions but we 
focused on 453 accessions with sequencing depths > 20x and mapping 
depths > 15x, because genome coverage stabilized when sequencing 
depths exceeded 20x (Extended Data Fig. 4a, b). We identified 93,683 
SVs, including 582 SVs larger than 500 kb, with an average of 12,178 
SVs per genome. The average sizes of the detected deletions, inversions 
and duplications are 5.3 + 0.6 kb, 127.1 + 19.4 kb and 105.1 + 22.7 kb, 
respectively (Fig. 3a, Extended Data Fig. 4c and Supplementary Data 3 
Table 2). 

SVs showed very strong XI-GJ differentiation. On average, each 
XI accession differed from Nipponbare RefSeq by 14,754 SVs (8,990 
translocations, 5,411 deletions, 188 inversions and 165 duplications), 
or 3.5 as many as in GJ accessions (Fig. 3a). On average, each cA or 
cB accession differed from Nipponbare RefSeq by 12,997 SVs and 7,892 
SVs, respectively. The total SV sequence that differentiated two GJ 
accessions was about 22 Mb, whereas it reached 71 Mb between XI and 
GJ accessions (Fig. 3b). Notably, 1,940 SVs disrupted protein-coding 
genes within GJ, whereas >6,518 occurred between XI and GJ acces- 
sions (Fig. 3c). The SV phylogenetic tree (based on 453 accessions) is 
similar to the SNP tree, and clearly separates XI, GJ, cA and cB acces- 
sions (Fig. 3d). Moreover, the 41,957 major-group-unbalanced SVs that 
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between subpopulations at the Sh4 locus on chromosome 4 using 10-kb 
sliding windows. b, Box plots of the distribution of 7 in 100-kb regions 
surrounding gene models across the genome. Box plots are shown for 
k=9 subpopulations for all 100-kb windows (All) (1 = 3,728 in total) and 
those containing genes annotated as transposable elements (TE) (n = 3,305 


were distributed unevenly among XI, GJ, cA and cB accessions (Fig. 3e) 
accounted for 44.7% of all SVs and 41.0% of the 582 large SVs. 


Pan-genome and population differentiation 

The widespread SV and genome size variation (Supplementary Data 3 
Tables 1 and 2) encouraged us to investigate the influence of PAVs on 
protein-coding genes across the 3K-RG. We first used a ‘map-to-pan’ 
strategy”® to build the species pan-genome (Extended Data Fig. 5a, b), 
by combining the Nipponbare RefSeq and non-redundant novel de 
novo assembled sequences; then, PAVs were determined by examining 
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windows), NTE (n = 3,709), from the OGRO/QTARO database (OGRO) 
(n= 828) and the subset of 78 domestication-related genes (AIG) (n=61 
windows). Box plots show the median, box edges represent the first and 
third quartiles, and the whiskers extend to farthest data points within 1.5.x 
interquartile range outside box edges. 


gene-body and coding sequence (CDS) coverage of mapped reads for 
each accession. 

We identified a total 268-Mb non-redundant novel sequences of 
length >500 bp with <90% identity to Nipponbare RefSeq from assem- 
blies of the 3,010 genomes, from which 12,465 novel full-length genes 
and several thousand novel genes with partial sequences were predicted. 
Nipponbare RefSeq genes and full-length novel genes could be merged 
into 23,876 gene families. The O. sativa core pan-genome was formed by 
12,770 (53.5%) gene families present in all 453 high-coverage genomes, 
2,056 (8.6%) without significant gene loss >1% (P value > 0.05) 
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Fig. 3 | Summary of SVs for the 453 high-coverage rice accessions. 

a, Number of deletions, duplications, inversions and translocations. 

b, Genome sizes affected by SVs. c, Numbers of genes affected (included or 
interrupted) by the SVs. d, Phylogenetic relationship of 453 rice accessions 
built from 10,000 randomly selected SVs. e, Characterization of the 42,207 


major-group-unbalanced SVs unevenly distributed among XI, GJ, cA and 
cB on the basis of two-sided Fisher’s exact tests. Bar plots in a-c are 

mean + s.d. and numbers of accessions in XI, GJ, cA, cB and admix are 
303, 92, 33, 10 and 15, respectively. 
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in all major groups formed candidate core gene families, and the 
remaining 9,050 (37.9%) comprised distributed gene families (Fig. 4a, b 
and Supplementary Data 3 Table 3). In silico simulation indicated 
these 9,050 gene families underestimate the distributed pan-genome 
(Fig. 4c). Hence, the O. sativa pan-genome consists of between 12,770 
and approximately 14,826 (53.5% to about 62.1%) core gene families, 
and at least 9,050 (37.9%) distributed gene families: each accession 
contains between 63.4% and about 73.5% core gene families and at 
least 26.5% distributed gene families (Fig. 4b). The core gene families 
have more members (Fig. 4d) and represent essential gene families. 
Indeed, 5,476 (36.9%) core or candidate core gene families are enriched 
in essential functions for growth, development and reproduction (using 
Gene Ontology, GO), whereas only 862 (9.5%) of the distributed gene 
families could be annotated with GO terms, showing enrichment in 
regulation of immune and defence responses and ethylene metabolism 
(Extended Data Fig. 6a, b). 

Pan-genome sequence coverage was evaluated using two new refer- 
ence genomes”, IR 8 from the XI group and N 22 from the cA group 
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average number of gene families that are different between two accessions. 
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(Supplementary Data 3 Table 4). We found 98.4% of the IR 8 and 
98.6% of the N 22 genome sequences could be mapped to the pan- 
genome, whereas only 94.3% and 94.0% could be found in Nipponbare 
RefSeq. By comparing pan-genome data with high-quality XI 
reference genomes of Zhenshan 97 and Minghui 63°”, approximately 
25% of the novel genes were shorter owing to gene predictions from 
fragmented sequences (Extended Data Fig. 5c, d). Novel gene assem- 
blies were validated by mapping raw reads of the 453 high-coverage 
genomes to the 12,465 novel genes; 11,792 genes (94.6%) had >95% 
CDS and >85% gene-body coverages were present in at least two 
rice lines. By comparison, 99.9% of Nipponbare RefSeq annotated 
genes were detected in the 453 high-coverage genomes (Extended 
Data Fig. 5e). Approximately 30% of the full-length novel genes were 
expressed with >1 read per kilobase per million reads in one or more 
of the 226 publicly available RNA sequencing datasets?! (Extended 
Data Fig. 5f, g). Further, benchmarking universal single-copy 
orthologues*” evaluation suggested little redundancy in predicted 
genes (Extended Data Fig. 5h). 
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Analyses of the PAVs of genes (or gene families) were able to dis- 
tinguish the major varietal groups, and show that there is consider- 
able variation among and within subpopulations (Extended Data 
Fig. 7a-d). On average, major group accessions differ by about 4,000 
(approximately 10%) genes and about 2,000 (approximately 10%) 
gene families, whereas XI and GJ accessions differ by more than 6,144 
(about 14.9%) genes and 2,878 (14.3%) gene families (Fig. 4e and 
Extended Data Fig. 7e). The GJ pan-genome has 23,167 gene fami- 
lies comprising 46,115 genes, which makes it 1.9% smaller than XI in 
terms of gene families and 2.5% smaller in terms of genes. However, 
all GJ accessions have 240 core gene families (1,594 genes) in com- 
mon, four times as many as in XI (Extended Data Fig. 7f). In addition, 
5,733 major-group-unbalanced gene families were more frequent in 
some populations but lower in others, including hundreds of XI- and 
GJ-predominant gene families (Fig. 4f). Moreover, we identified 4,270 
XI and 1,384 GJ subpopulation-unbalanced gene families, showing 
variation between subpopulations within each major group (Extended 
Data Fig. 7g). 


Evolution and domestication of rice 

To gain insights into the evolutionary history of the rice pan-genome, 
gene and gene family ages were estimated by aligning protein sequences 
to the NR protein database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/) 
partitioned into 13 taxonomic levels (Extended Data Fig. 8a, b). We 
observed that: (1) new genes and gene families evolved at alternating 
rates from phylostratum 1 (PS1) (approximately 3.6 billion years ago) 
to the emergence of the terminal PS13 clade containing O. sativa (about 
1.5 million years ago); (2) there was an explosive emergence of new 
genes accompanying the appearance of Oryza at PS12; (3) core genes 
tended to be more ancient, and most novel genes or gene families were 
younger and shorter (Extended Data Fig. 8c, d), consistent with recent 
reports for other species*; (4) significantly (P value < 0.001) higher 
SNP variation occurred in distributed genes than in core genes (0.0325 
versus 0.0142 SNPs per base) (Extended Data Fig. 8e); and (5) a signi- 
ficantly (P value < 0.001) higher proportion of core genes were under 
negative selection as compared with those in the Nipponbare RefSeq 
(Extended Data Fig. 8f). 

Regarding O. sativa domestication, we constructed haplotype plots 
for nine important domestication genes—Rc”>, Bh434, PROG1*», 
OsC1*°, Sh4'°, Wx?3, GS33’, qSH1*° and gS W5°8 (Fig. 5a-c and 
Extended Data Fig. 9). Although a large number of XI samples carry 
an allele found in GJ, many XI accessions carry alleles at each of these 
loci that are absent in GJ (Fig. 5d). In fact, about 70% of XI accessions 
do not carry GJ introgressions in at least four genes, and only one XI 
sample (out of 1,789) had introgressed GJ haplotypes at all nine genes. 
This observation supports a model of independent domestication of 
some of the XI pool, rather than the simpler GJ-to-XI introgression 
hypothesis”. Furthermore, the 14-bp deletion in Rc” for domesticated 
white pericarp was found in several XI lines that carried non-intro- 
gressed haplotypes (Extended Data Fig. 9), which suggests independent 
selection in part of the XI gene pool before introgression of the GJ 
haplotype became widespread in XI. 


Utility of the 3K-RG panel 

We demonstrated the use of the 3K-RG genomes and SNPs for trait 
mapping analyses for the highly heritable traits of grain length, grain 
width and bacterial blight resistance (Supplementary Notes). Major 
peaks for grain length with significantly (P value < 10~'°) associated 
markers are on chromosomes 1, 3, 5, 6 and 7, and minor peaks are on 
chromosomes 4, 9, 10 and 11 (Extended Data Fig. 10a). Major peaks 
for grain width are found on chromosomes 1 and 5, with minor peaks 
on chromosomes 3 and 9 (Extended Data Fig. 10b). Genome-wide 
association study (GWAS) peaks were concordant with known loci, 
including GS3*’, GW5*?, and qGL7™ for grain length, and GW5 for 
grain width. For grain width, the chromosome 9 novel peak coincides 
with OsFD1"!, which codes for a bZIP transcription factor involved in 
flowering time and developmental plasticity (its pleiotropic regulatory 
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function may therefore also affect grain width). Twelve peaks were 
detected for bacterial blight resistance to strain C5 of Xanthomonas 
oryzae, with the largest clustered around the resistance gene Xa26™ 
on chromosome 11 (Extended Data Fig. 10c). Moreover, correlation 
between gene PAVs and plant height detected the well-known green 
revolution gene (sd1) as the first-ranked candidate. sd1 is classified 
as a distributed gene—caused by an approximately 385-bp deletion— 
and is significantly (P value < 10~°) associated with greatly reduced 
plant height; it was absent most frequently in XI-1A and XI-1B varieties 
(Extended Data Fig. 11). 


Discussion 

We characterized genetic variation in the 3,010 sequenced acces- 
sions of O. sativa and found a high level of genetic diversity in rice. 
Although the 3K-RG analysis is expected to identify nearly all poly- 
morphisms with MAF > 1%, our simulations suggest that it includes 
<40% of rare bi-allelic SNPs (MAF < 1%) in the International Rice 
Gene bank at the International Rice Research Institute (Extended 
Data Fig. 1c). We also characterized structural variation, and 
found that the average number of SVs between pairs of XI genomes 
(>12,000) was similar to that between two high-quality reference 
XI genomes*”. The vast majority were deletions and translocations 
distributed across the genome (Extended Data Fig. 4c). Medium- 
sized SVs (>500 kb) were mostly inversions and duplications, and 
a large percentage of them (37.9%) occur differentially between XI 
and GJ. We speculate that large numbers of SVs may contribute to 
the varying degrees of hybrid sterility and hybrid breakdown between 
XI and GJ accessions**. We also report pan-genome analyses for 
O. sativa, and the high numbers of PAVs highlight another compo- 
nent of within-species diversity for rice. 

Our analysis brings more resolution to the within-species diversity 
of O. sativa (Extended Data Fig. 8e). Larger pan-genomes occur in XI 
than GJ accessions, but GJ accessions have more core genes than XI 
(Supplementary Data 3 Table 3), a result that was expected given the 
greater diversity within XI than GJ. This may relate to differences in 
eco-geographical distribution: GJ accessions experience harsher high- 
altitude and/or high-latitude environments, versus the less harsh but 
more diverse environments experienced by XI rice. Understanding the 
major group/subpopulation-core, -unbalanced and -predominant gene 
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functions is expected to shed light on environmental adaptation of rice 
variety groups over thousands of years. 

Although the 3K-RG population structure analyses based on SNPs 
and SVs were consistent with the five major groups that were pre- 
viously known, additional subpopulations in the XI and GJ groups 
were identified and were suggestive of nine subpopulations that are 
correlated with geographic origin. Large numbers of SNPs, genes and 
gene families, and SVs were found to be unique to or predominant in 
single subpopulations. Varying patterns of diversity reduction across 
different rice subpopulations were observed in and around about 1,000 
well-characterized genes. A closer look at patterns of haplotype sharing 
at domestication genes suggests that not all ‘domesticatiom alleles came 
to XI from GJ. Taken together, our results—combined with archaeo- 
logical evidence of XI cultivation for >9,000 years in both India and 
China***°—support multiple independent domestications of O. sativa. 

Our 3K-RG analysis highlights the genetic diversity that exists in 
rice germplasm repositories, and the usefulness of establishing a digital 
gene bank in which all accessions can be sequenced and catalogued. For 
example, we estimate that sequencing the rest of the gene bank of the 
International Rice Research Institute may enable the identification of 
>27 million additional SNPs (Extended Data Fig. 1d). The next chal- 
lenge will be to examine associations of the 3K-RG genetic variation 
with agriculturally relevant phenotypes measured under multiple field 
and laboratory environmental conditions; this will guide and acceler- 
ate rice breeding by identifying genetic variation that will be useful in 
breeding efforts and future sustainable agriculture. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0063-9. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and investigators were not blinded to allocation during 
experiments and outcome assessments. 

Sequencing data of the 3,000 Rice Genome project. The selection and sequencing 
of rice accessions have previously been described!?. The SNPs/indels and SVs in 
3,010 accessions were identified by mapping against the Nipponbare RefSeq, and 
the pan-genome sequence was created by integrating the Nipponbare RefSeq and 
non-redundant novel sequences derived from 3,010 rice assemblies. SV compari- 
son and gene PAV analyses focused on 453 rice accessions with sequencing depth 
>20x and mapping depth >15x (Extended Data Figs. 4a, 5b). 

Detection of SNPs and indels. Reads were aligned to the Nipponbare RefSeq 
using BWA-MEM (release 0.7.10)*”. The mapped reads were then sorted and 
duplicates were removed by Picard tools (release 1.119) (http://broadinstitute. 
github.io/picard/). The reads around indels were realigned by GATK 
RealignerTargetCreator and IndelRealigner package (release 3.2-2)**. The variants 
were called for each accession by the GATK UnifiedGenotyper (release 3.2-2)* 
with ‘EMIT-ALL-SITES’ option. A joint genotyping step for comprehensive SNP 
union and filtering step was performed on the 3,010 emit-all-sites VCF files. A 
variant position is reported if at least one sample supports it with QUAL no less 
than 30. A total of 29,399,875 SNPs (27,024,796 are bi-allelic) and 2,467,043 indels 
(small insertions and deletions <40 bp) were identified from the analyses of the 
genomes of 3,010 accessions. Three subsets of the 3K-RG Nipponbare SNPs were 
defined using the following filtering criteria: (1) a base SNP set of ~17 million 
SNPs created from the ~27 million high-quality bi-allelic SNPs by removing 
SNPs in which heterozygosity exceeds Hardy-Weinberg expectation for a partially 
inbred species, with inbreeding coefficient estimated as 1—Hpps/Hexp, in which 
Hops and Hexp are the observed and expected heterozygosity, respectively (detailed 
in Supplementary Notes); (2) a filtered SNP set of ~4.8 million SNPs created from 
the ~17-million-SNP base SNP set by removing SNPs with >20% missing calls 
and MAF < 1%; and (3) a core SNP set of SNPs derived from the filtered SNP 
set using a two-step linkage disequilibrium pruning procedure with PLINK””, 
in which SNPs were removed by linkage disequilibrium pruning with a window 
size of 10 kb, window step of one SNP and r threshold of 0.8, followed by another 
round of linkage disequilibrium pruning with a window size of 50 SNPs, window 
step of one SNP and r’ threshold of 0.8. 

Determining the effects of SNPs. The effects of all bi-allelic SNPs (low, medium 
and high effects) on the genome were determined based on the pre-built release 
7.0 annotation from the Rice Genome Annotation Project (http://rice.plantbiology. 
msu.edu/) using SnpEff*! release 4.11, with parameters -v -noLog -canon rice7. 
Using sequence ontology terms, a low-effect SNP was classified as ‘synonymous_ 
variant, ‘splice_region_variant; ‘initiator_codon_variant, ‘5_prime_UTR. 
premature_start_codon_gain_variant or ‘stop_retained_variant. A moderate- 
effect SNP was identified as a ‘missense_variant’ and a high-effect SNP as a ‘start_ 
lost; ‘stop_gained; ‘stop_lost, ‘splice_donor_variant or ‘splice_acceptor_variant. 
For indel effects, only indels with lengths that were not multiples of three were 
counted and SNPs overlapped with protein-coding regions (CDSs of RGAP 7 
genes) were considered as the most disruptive effects on genes. Results of the SNP 
and indel effect analysis are given in Supplementary Data 2 Tables 3, 4. We com- 
puted the SNP numbers (proportions) of rare SNPs and homozygous singletons 
for a ‘typical genome’ of a subpopulation as the median SNP number (proportion) 
of the SNPs in a given category among those genomes for that subpopulation 
(Supplementary Data 2 Table 5). 

Population structure and SNP diversity. Multi-dimensional scaling analysis 
was performed using the ‘cmdscale’ function in R, using the IBS distance matrix 
of the 3K-RG genomes computed with PLINK**” on the filtered SNP set. The 
same distance matrix was used to construct a phylogenetic tree by the unweighted 
neighbour-joining method, implemented in the R package phangorn*. The 
population structure of the 3K-RG dataset was analysed using ADMIXTURE 
software*® on the core SNP set (version 0.4, http://snp-seek.irri.org/download. 
zul). First, ADMIXTURE was run on 30 random 100,000-SNP subsets of the core 
SNP set with k (the number of groups) ranging from 5 to 18, and k=9 was chosen 
because it was the minimal value of k to separate all previously known groups 
(cA, cB, XI, GJ-trp, GJ-tmp and part of GJ-sbtrp). With k=9, ADMIXTURE was 
then run again on the whole core SNP set nine times with varying random seeds; 
the Q-matrices were aligned using CLUMPP software” and clustered on the basis 
of similarity. Then, the matrices belonging to the largest cluster were averaged to 
produce the final matrix of admixture proportions. Finally, the group membership 
for each sample was defined by applying the threshold of > 0.65 to this matrix. 
Samples with admixture components <0.65 were classified as follows. If the sum 
of components for subpopulations within the major groups XI and GJ was > 0.65, 
the samples were classified as XI-adm or GJ-adm, respectively, and the remaining 
samples were deemed ‘fully’ admixed (admix). Branches of the phylogenetic tree 
were coloured according to the k= 9 admixture classification (Fig. 1). 


We computed linkage disequilibrium decay in each subpopulation as follows. 
The value of r? was computed for each pair of SNPs of frequency > 10% in the 
respective subpopulations that are separated by at most 300 kb using PLINK. The 
distances were binned into 1-kb bins (separately for each chromosome) and the 
median value of r” in each bin was taken. The medians for each chromosome were 
then averaged to produce a final r’ estimate for the bin. We computed nucleotide 
diversity (77) for non-overlapping 10-kb and 100-kb windows along the Nipponbare 
RefSeq by adopting an approach similar to VariScan™ for genome-wide DNA 
polymorphism analyses and implemented as a custom R script. 

Detection of genomic SVs and population differentiation. Genomic-SV detec- 
tion for each of the 3,010 rice accessions was performed using a customized ver- 
sion of novoBreak” (https://sourceforge.net/projects/novobreak/?source=navbar) 
against the Nipponbare RefSeq. SVs inferred by no less than 3 reads were fur- 
ther filtered with the following conditions: (1) more than four supporting split 
reads or (2) no fewer than three discordant read pairs. We detected deletions, 
inversions and duplications with sizes between 100 bp and 1 Mb, and transloca- 
tions. Here, translocations were SVs with ‘inter-chromosomal breakpoints. All 
SVs that passed the filter criteria in the 3K-RG accessions were pooled together. 
Two adjacent SVs were identified as the same SV if their start and end positions 
varied no more than 1 kb, and the overlapping region was more than 50% of the 
total size. The presence-absence matrix of SVs in each accession was built based 
on this pooled SV dataset. To obtain reliable SV comparison analysis results, we 
focused only on the 453 high-depth accessions (Extended Data Fig. 4a). Major- 
group-unbalanced SVs were determined by two-sided Fisher's exact test followed 
by Benjamini-Hochberg adjustment (false discovery rate (FDR) < 0.05), similar 
to the detection of major-group-unbalanced genes. 

De novo assembly. A variation of SOAPdenovo2” (version r240) with custo- 
mized k-mers was used to assemble the rice genomes. A k-mer value was initially 
set for each accession according to a linear model ‘K=2*int (0.38* (sequencing 
depth) +10)+1) which was trained from 50 randomly selected rice accessions. 
The best k-mer value was decided by checking the N50 of the SOAPdenovo 
results. The command line for SOAPdenovo was ‘SOAPdenovo-63mer 
(or SOAPdenovo-127mer) all -s configure_file (average insertion length set as 
460 in the configure file) -K k-mer -R -F’ with iteration over different k-mers 
until N50 of the assembly with that k-mer is higher than those with “k-mer +2’ 
and ‘k-mer —2’ On average, we needed to run SOAPdenovo ~3.94 times for 
each rice accession. The quality of the genome assembly was evaluated for these 
contigs using QUAST version 2.3°”. 

Sequencing and de novo assembly of IR 8 and N 22 reference genomes. High 
molecular weight DNA was extracted from young leaves adopting the proto- 
col** with minor modifications. The PacBio library was prepared following the 
20-kb protocol (see “User-Bulletin-Guidelines-for-Preparing-20-kb-SMRTbell- 
Templates document.pdf’, available from https://www.pacb.com/support/ 
documentation/?fwp_documentation_search="PN%20100-286-700-04") and 
was sequenced on an RSII sequencer with movie collection time of 6 h. The raw 
data of N 22 and IR 8 were assembled with FALCON® and Canu™, respectively. 
Contigs were polished twice with PacBio raw reads using Quiver (https://github. 
com/PacificBiosciences/GenomicConsensus) and the IR 8 assembly was further 
polished with 66x WGS 2x 150-bp Illumina data using Pilon®’. Polished contigs 
were assigned to pseudomolecules using Genome Puzzle Master®. Assembly sta- 
tistics can be found in Supplementary Data 3 Table 4. IR 8 and N 22 were applied 
to evaluate the completeness and redundancy of the pan-genome. 

Pan-genome construction. SOAPdenovo assembly for each accession was assessed 
by QUAST®’ with Nipponbare RefSeq as the reference. From QUAST output, 
unaligned contigs longer than 500 bp were retrieved and merged. CD-HIT version 
4.6.1 was used to remove redundant sequences at a cutoff of 90% identity with 
the command “‘-c 0.9 -T 16 -M 50000. For remaining sequences, all-versus-all 
alignments with BLASTN were carried out to ensure that these sequences had no 
redundancy. Next, various contaminants including Archaea, bacteria, viruses, fungi 
and metazoans were removed. The non-redundant sequences were aligned to the 
NT database (downloaded from NCBI, 26 July 2014) with BLASTN with parameters 
‘-evalue le-5 -best_hit_overhang 0.25 -perc_identity 0.5 -max_target_seqs 10. 
Contigs of which the best alignments (considering E-values and identities) were 
not from Viridiplantae were considered as contaminants and were filtered out. 
The remaining contigs formed the non-redundant novel sequences. The rice 
species pan-genome was then generated by combining the Nipponbare RefSeq and 
non-redundant novel sequences. 

Annotation of the pan-genome. The gene-transcript annotation of the 
Nipponbare RefSeq was downloaded from the Rice Annotation Project™, and 
if a protein-coding gene contained multiple transcripts only the transcript with 
the longest open reading frame was selected as the representative for the gene. 
Protein-coding genes on novel sequences were predicted using MAKER®, a gene 
prediction tool combining ab initio predictions, expression evidence and protein 
homologies. In detail, repeats were first masked (soft mask for low-complexity 
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repeats) with RepeatMasker (www.repeatmasker.org) and RepeatRunner®™. Two ab 
initio predictors, SNAP® and AUGUSTUS®, were called by MAKER® to predict 
gene models with their default parameters for rice. All rice expressed sequence tags 
(ESTs) were downloaded from GenBank (15 December 2014) and were aligned 
to the novel sequences with BLASTN. All rice proteins were downloaded from 
NCBI (15 December 2014) and were aligned to the novel sequences with BLASTX. 
To obtain more informative alignments, Exonerate® was used to realign each 
sequence identified by BLAST around splice sites. EVidenceModeller”’ was used 
to combine and refine the ab initio predictions with RNA and protein evidence. 
Incomplete gene models were removed before the consequent analysis. 
Adjustment of predicted genes. We aligned the predicted transcripts against 
Nipponbare RefSeq to remove potential redundancy. Redundant genes were 
removed when the genes were clustered into gene families. However, when attempt- 
ing to identify the number of novel genes, the redundant ones were removed first. 
We clustered all genes at a global identity of 95%, and removed novel genes that 
were not representative of the group. 

Evaluation of pan-genome redundancy. We ran BUSCO (benchmarking univer- 
sal single-copy orthologues) v.2.0°” on CX140 (a Nipponbare accession) assembly, 
Nipponbare RefSeq, CX368 (an N 22 accession) assembly, N 22 high-quality refe- 
rence genome and the pan-genome sequences. Augustus-3.2.3% and hmmer-3.1b”! 
were used for gene prediction in BUSCO. BUSCO was run with genome mode with 
embryophyta_odb9 as a reference. 

Functional analysis. All protein sequences of pan-genome were extracted and 
aligned to the GO sequence database (http://geneontology.org/ on 4 April 2015) 
with BLASTP. Only alignments with E-values < 1 x 10~° and identity > 0.3 were 
used. GO terms for each gene were estimated to be the same as those of its best-hit 
protein. In total, 20,842 (43.3%) genes could be annotated. For a gene family, its 
GO terms are the non-redundant set of the GO terms of the genes within this gene 
family. Overall, 6,338 (26.5%) gene families could be annotated. Enrichment of 
GO terms was carried out using the GOstats” package in R with all gene families 
as the background. 

Validation of the non-Nipponbare RefSeq genes. We verified the novel genes by 
multiple approaches. First, for each gene, we examined the number of accessions 
that possessed it. We mapped the sequencing reads to the pan-genome sequences. 
Genes with CDS coverage over 0.95 and gene-body coverage over 0.85 were con- 
sidered to be present. Second, we verified the novel genes with 226 RNA sequenc- 
ing experiments from 17 projects*”. RNA sequencing reads were first trimmed 
with Trimmomatic version 0.327? with parameters ‘ILLUMINACLIP:2:30:10 
LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:36 and then 
aligned to the pan-genome sequences with a split-aligner HISAT2 version 2.0.1- 
beta”! using default parameters. The coverage of each gene was calculated with 
‘BEDtools coverage’ in BEDtools suite version 2.17.0”. 

Gene family annotation. The genes were clustered to gene families with 
OrthoMCL version 2.0.9”°. All genes were extracted and translated into protein 
sequences and the protein sequences were compared by using all-by-all BLASTP 
(E-value= 1 x 10~°). OrthoMCL was applied to process the BLASTP output and 
cluster genes to gene families. Similarity of protein families was set to be 0.5 as 
suggested by a previous publication”®. 

Determination of gene presence or absence. We proposed a ‘map-to-pan strat- 
egy to determine gene presence or absence”®. For the 453 accessions with high 
sequencing depth, although only about 60%-70% of their genomes can be de novo 
assembled (contig > 500 bp), more than 98% of their genomes can be covered 
by short read mapping. This enabled the use of coverage of genes to determine 
their presence or absence. In practice, genes with CDS coverage over 0.95 and 
gene-body coverage over 0.85 were considered present. If one member of a gene 
family is present in a given rice accession, the gene family is considered as present. 
Determination of core and distributed genes or gene families. A core gene 
(or gene family) is a gene (or gene family) present in all rice accessions, and we 
further defined candidate core genes (or gene families) as those with loss rates not 
significantly larger than 0.01 in all major groups. We first examined whether a gene 
(or a gene family) is distributed (loss rate > 0.01) in each type of O. sativa (XI, GJ, 
cA and cB). Binomial tests (with a null hypothesis of loss rate < 0.01) were carried 
out for each gene in each type. A P value below 0.05 meant that this gene (or a gene 
family) was lost in a significant proportion of rice accessions and is a distributed 
gene (or gene family) of these subpopulations. If a gene (or a gene family) was not 
determined to be distributed in all types (and it was not core), it was considered 
to be a candidate core gene (or gene family) of O. sativa. Other genes (or gene 
families) were considered to be distributed. 

Determination of major-group-unbalanced, subpopulation-unbalanced and 
random genes or gene families. Distributed genes (or gene families) were divided 
further into major-group-unbalanced, subpopulation-unbalanced and random 
genes (or gene families). Major-group-unbalanced genes (or gene families) are 
defined as genes (or gene families) that are unequally distributed among XI, GJ, 
cA and cB groups. A two-sided Fisher's exact test was used to determine whether 
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the distribution of each gene (or gene family) is uniform. The P values of all genes 
were calculated with the ‘Fisher.test’ function in R and were then adjusted with 
the Benjamini-Hochberg FDR method. Genes (or gene families) with FDR < 0.05 
were considered as major-group-unbalanced. 

Subpopulation-unbalanced genes (or gene families) are defined as genes 
(or gene families) that are unequally distributed among subpopulations; thus, 
they can be divided into XI-subpopulation-unbalanced genes (or gene families) 
and GJ-subpopulation-unbalanced genes (or gene families). XI-subpopulation- 
unbalanced genes (or gene families) are defined as genes (or gene families) that 
are unequally distributed among XI-1A, XI-1B, XI-2 and XI-3 subpopulations. 
GJ-subpopulation-unbalanced genes (or gene families) can be defined similarly. 
The same statistical methods for the major groups were applied to determine 
the distribution balance for subpopulations. We defined genes (or gene families) 
that are neither major-group-unbalanced nor subpopulation-unbalanced to be 
‘random genes. 

Gene and gene-family age. Gene ages were inferred with previously described 
methods’’. The NR protein database was downloaded from NCBI (28 March 
2015) and all protein sequences were grouped according to 13 taxonomic levels 
(PS1: Cellular organisms; PS2: Eukaryota; PS3: Viridiplantae; PS4: Streptophyta, 
Streptophytina; PS5: Embryophyta; PS6: Tracheophyta, Euphyllophyta; PS7: 
Spermatophyta; PS8: Magnoliophyta, Mesangiospermae; PS9: Liliopsida, 
Petrosaviidae, Commelinids, Poales; PS10: Poaceae; PS11: BOP clade; PS12: 
Oryzoideae, Oryzeae, Oryza; and PS13: O. sativa) based on NCBI taxonomy. 
Thirteen BLASTP databases were built for protein sequences from PS1 to PS13. 
All genes on pan-genome sequences were first translated into proteins, and were 
aligned to the 13 databases using BLASTP with E-values < 1 x 10~° and identity 
> 0.3. The age of a gene was considered as the taxonomic level of the oldest aligned 
protein. Genes that failed to align to all databases were assigned gene ages of PS13 
(O. sativa). Some PS13 genes were reassigned as PS12 genes if they could be cov- 
ered by 446 wild rice genomes? with both gene-body coverage > 0.95 and CDS 
coverage > 0.95. The age of a gene family was considered as the age of the oldest 
gene within the gene family. 

Introgression test. To test whether an XI sample had a non-introgression haplo- 
type at a locus, we defined a D-value for a sample x as D(x) = d(x,XI) — d(x,GJ), in 
which d(x,XI) is the mean distance from sample x to a XI sample at the given locus. 

With no gene flow from GJ to XI and vice versa, the D-value is negative for XI 
and positive for GJ. On the other hand, if an XI sample shares a haplotype with a GJ 
sample, the D-value will be positive and close to the D-values of GJ samples. For an 
XI sample, we rejected the hypothesis of GJ introgression if its D-value was negative 
and less than the lower bound of the 99% confidence interval for the D-value of 
GJ samples, which was computed on the subset of GJ consisting of samples with a 
positive D-value, to exclude the effect of potential XI-to-GJ introgression. 
Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Code availability. Code for studying pan-genome and gene and gene family 
PAVs are now integrated and published as the EUPAN toolkit”®. Tailored novo- 
Break-germline is available at https://sourceforge.net/projects/novobreak/? 
source=navbar. Code for nucleotide diversity and SNP merging is available at 
https://github.com/dchebotarov/3k-SNP-paper. All other code is available from 
the corresponding authors upon request. 

Data availability. The BAM alignment file and variant calls in VCF format for 
each accession of the 3K-RG against Nipponbare RefSeq are freely downloadable 
from Amazon Public Data at https://aws.amazon.com/public-data-sets/3000-rice- 
genome/ and the Department of Science and Technology Advanced Science and 
Technology Institute of the Philippines (DOST-ASTI) IRODS site, as described 
on the 3K-RG project site (http://iric.irriorg/resources/3000-genomes-project). 
The SV and PAV data of 3K-RG are available in the figshare database’® (https:// 
doi.org/10.6084/m9.figshare.c.3876022.v1). 

The following web tools are available for the mining, analysis and visualization 
of the 3K-RG dataset: SNP-Seek, http://snp-seek.irri.org; RMBreeding databases, 
http://www.rmbreeding.cn/index.php; rice cloud of genetic data public projects, 
http://www.ricecloud.org/; IRRI Galaxy, http://galaxy.irri.org/; and the 3,000 rice 
pan-genome browser”, http://cgm.sjtu.edu.cn/3kricedb/. 

The 3K-RG sequencing data used for our analyses can be obtained via project 
accession PRJEB6180 from NCBI (https://www.ncbi.nlm.nih.gov/sra/?term= 
PRJEB6180), accession ERP005654 from DDBJ (https://www.ddbj.nig.ac.jp/index 
-e.html) and from the GigaScience Database (https://doi.org/10.5524/200001). 
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Extended Data Fig. 1 | SNP filtering, discovery rate, and projected 
discovery upon further sequencing. a, Proportion of heterozygous 
calls versus allele frequency. Each dot represents a SNP from a random 


sample of 100,000 SNPs. Blue curve shows theoretical Hardy-Weinberg 
equilibrium. The points have opacity of 5%, such that regions with higher 


point densities are highlighted. The bulk of SNPs lie on the Hardy- 


Weinberg equilibrium curve scaled by a factor of about 0.05, which implies 
a Wright's inbreeding coefficient of F= 0.95. b, The same plot with colour 
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Extended Data Fig. 6 | Representative enriched biological processes One-sided hypergeometric test built in the GOstats R package was used 
of core and distributed gene families. a, b, Representative enriched to calculate the P value of each GO term. The numbers of gene families 
biological processes of core (a) and distributed gene families (b) are involved in each GO term are shown in blue. 


shown, with all terms sorted by their enriched P values (red bars). 
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Extended Data Fig. 7 | Characterization of gene or gene family presence 
or absence variations. a, b, Phylogenetic trees of the 453 rice accessions 
constructed on the basis of the presence or absence of the distributed genes 
(a) and gene families (b); both of which classified the 453 accessions into 
two major groups (XI and GJ), with each being further divided into several 
subpopulations that are tagged with different colours representing their 
classifications based on the SNP. c, d, Gene (c) or gene family (d) numbers 
per accession in different subpopulations; gene or gene family numbers 
were significantly different among XI subpopulations (Kruskal-Wallis 
tests, P value = 9.8 x 10~8 (gene) or 1.0 x 10~° (gene family)). Box plots 
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show the median, box edges represent the first and third quartiles and the 
whiskers extend to 1.5x interquartile range. e, The average number of 
genes that are different between two accessions in which all combinations 
of the 453 accessions were considered, and the proportions were calculated 
as the number of such differentiating genes adjusted by the gene numbers 
held in common by the two genome types. f, Venn diagram of the numbers 
of the core + candidate core gene families among the major groups of 

O. sativa. g, Cluster analysis of 4,270 XI-subpopulation-unbalanced gene 
families and 1,384 GJ-subpopulation-unbalanced gene families. 
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Rc show the presence of many distinctly non-GJ haplotypes that carry 
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on source accessions for grain length and grain width (223,743 SNPs) 
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variation with plant height (m = 323). The P values were calculated with frequencies in rice subpopulations. 


Spearman's correlation. b, Examples show that semi-dwarfism results 
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preservation in liver transplantation 
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Preservation in Europe 


Liver transplantation is a highly successful treatment, but is severely limited by the shortage in donor organs. However, 
many potential donor organs cannot be used; this is because sub-optimal livers do not tolerate conventional cold storage 
and there is no reliable way to assess organ viability preoperatively. Normothermic machine perfusion maintains the liver 
in a physiological state, avoids cooling and allows recovery and functional testing. Here we show that, in a randomized trial 
with 220 liver transplantations, compared to conventional static cold storage, normothermic preservation is associated 
with a 50% lower level of graft injury, measured by hepatocellular enzyme release, despite a 50° lower rate of organ 
discard and a 54% longer mean preservation time. There was no significant difference in bile duct complications, graft 
survival or survival of the patient. If translated to clinical practice, these results would have a major impact on liver 


transplant outcomes and waiting list mortality. 


Liver transplantation is the accepted treatment for end-stage liver 
failure, with one and five year survivals in excess of 90% and 70%, 
respectively'. With increasing rates of liver disease’, the supply of 
transplantable organs is no longer able to meet demand. Paradoxically, 
despite substantial waiting list mortality (for example, 21% in the UK), 
only 63% of UK deceased donor livers are transplanted’. Increasing 
numbers of deceased organ donors in many countries have not been 
matched by a corresponding rise in the number of transplantable 
organs. This is mainly because these additional donors tend to be 
high-risk—either declared dead by cardiovascular criteria (DCD), as 
opposed to brainstem death donors (DBD), or elderly with multiple 
co-morbidities (extended criteria donors). Such organs pose a greater 
risk to the recipient, with a higher probability that the liver will never 
function (primary non-function (PNF)) or that it will lead to later com- 
plications, particularly biliary stricturing. 

Despite many advances in liver transplantation, the method of 
organ preservation has changed very little in almost 30 years. The 
liver is flushed and cooled with specialist preservation fluid, then 
stored in an icebox. This process of static cold storage (SCS) has 
several limitations. Although SCS slows metabolism by 10- to 12-fold, 
substantial anaerobic activity continues even at ice temperature’. 
This leads to ATP depletion and accumulation of succinate and other 
metabolites. These lead to the generation of reactive oxygen species® 
that are the basis of ischaemia-reperfusion injury, when the organ 
is re-exposed to oxygenated blood at the time of transplantation. 
This damage, exacerbated by any prior injury, limits the maximum safe 


preservation time of the donor organ. Once cooled, the cessation of 
normal cellular activity also makes functional assessment impossible. 

These shortcomings are particularly problematic in the higher-risk 
donor organs that form an increasing proportion of current liver 
transplant practice. The very severe ischaemia-reperfusion-related 
morbidity that characterizes transplantation of such organs is nowa 
major limitation in meeting the demand for life-saving transplants. To 
combat the limitations imposed by cold storage, a change in preser- 
vation technology is required. In recent years, interest has developed 
in perfusion at physiological temperature (normothermic machine 
perfusion (NMP))*’. 

During NMP, the liver is perfused with oxygenated blood, med- 
ications and nutrients at normal body temperature to maintain a 
physiological milieu. Evidence from animal models of both DBD 
and DCD liver transplantation'®'! suggests that this improves 
the post-transplant survival of transplanted livers, and potentially 
enables the assessment of organ viability during preservation. The 
mechanism underlying these improved outcomes is at least partly 
related to the metabolic resuscitation of the organ that occurs with 
preservation under physiological conditions. This has been demon- 
strated through the replenishment of ATP levels!!, which in turn con- 
tributes to a reduction in the severity of the ischaemia-reperfusion 
injury that is experienced after transplant>’®. 

There is increasing interest in the clinical application of NMP, with 
several cases described in the recent literature®’. In 2013, a phase-I 
study by our group’, demonstrated the safety and feasibility of NMP in 
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Fig. 1 | Image of liver during normothermic machine perfusion. 

The hepatic artery (HA), portal vein (PV), inferior vena cava (IVC) and 
common bile duct (CBD) are all cannulated. The gallbladder (GB) is also 
present although this was often removed during the retrieval process before 
NMP. This image has been used with consent from the family of the donor. 


20 liver transplant recipients. This was used as the precursor to the pres- 
ent study which, to our knowledge, is the first randomized controlled 
trial to test the efficacy of machine perfusion against conventional cold 
storage in liver transplantation. 

Livers from adult DBD or DCD donors were eligible for enrolment. 
Adult patients awaiting a liver-only transplant, excluding those with 
fulminant liver failure, were eligible. If a suitable liver was allocated 
to a consented recipient, the liver was randomized to either conven- 
tional SCS or NMP. In the SCS arm, the organ retrieval, storage and 
the transplant were conducted according to standard practice. In the 
NMP arm, following removal from the donor, the liver was attached to 
the OrganOx metra NMP device, where it was perfused throughout the 
duration of preservation (Fig. 1), until the transplanting surgeon was 
ready to implant it, at which point it was removed from the device. The 
remainder of the recipient's care followed standard practice. 

Daily during the first postoperative week, and at day 10, day 30, 
month 6 and month 12, biochemical results were recorded as well as 
graft and patient survival data. At six months, a magnetic resonance 
imaging scan of the biliary tree (MRCP) was performed to assess evi- 
dence of biliary injury. Biological samples were collected and stored in 
a biobank from each liver and recipient enrolled in the study, for use 
in further mechanistic studies. 

The primary endpoint was defined as the difference between the two 
treatment arms in the peak level of serum aspartate transaminase (AST) 
within seven days after transplant. This hepatocellular enzyme is a clini- 
cally accepted biomarker, predictive of graft and patient survival!”. 


Recruitment 
Between 26 June 2014 and 8 March 2016, 334 livers were randomized, 
with 64 livers subsequently excluded (Fig. 2). Following organ retrieval, 
a markedly different discard rate between the two trial arms resulted in 
100 SCS and 120 NMP livers available for primary outcome reporting, 
with 101 SCS and 121 NMP livers available for secondary outcome anal- 
ysis. This discrepancy in group size reduced the study power to 89.7%. 
One NMP liver was cold stored due to an accessory left hepatic artery 
arising from the aorta preventing effective cannulation. Eight NMP 
livers received machine perfusion for less than four hours (for logistic 
rather than technical reasons). All of these organs are included in the 
NMP armas part of the modified intention to treat analysis. For the 
per protocol sensitivity analysis, the eight livers perfused for less than 
four hours were excluded and the single NMP liver that was preserved 
using SCS was reassigned to the SCS group. 
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Number of livers randomized 


n=335 
Excluded n = 1 
|__| (randomized in error: R&D not 
in place) 
Allocated to NMP Allocated to SCS 
n=170 n=164 


! ! 


Included n = 133 


Included n = 137 


Excluded n = 33 

¢ DCD did not proceed n = 17 

e Non-consented recipient n = 6 
e Non-eligible donor organ n = 8 
e Other reasons n = 2 


Excluded n = 31 

¢ DCD did not proceed n = 20 

e Non-consented recipient n = 6 
¢ Non-eligible donor organ n = 4 
¢ Other reasons n = 1 


! ! 


Successfully transplanted n = 121 


Successfully transplanted n = 101 


Discarded n = 16 Discarded n = 32 


! ! 


Analysed (primary outcome): 
ITT n = 120 
PPn=111 


Analysed (primary outcome): 
ITT n = 100 
PP n=101 


Not analysed n = 1 
(no AST available after transplant) 


Not analysed n = 1 
(no AST available after transplant) 


! ! 


Outcome at 1 year: 

Alive with functioning graft n = 112 
Died with functioning graft n = 3 
Re-transplanted n = 3 

Death with graft failure n = 3 


Outcome at 1 year: 

Alive with functioning graft n = 95 
Died with functioning graft n = 2 
Re-transplanted n = 2 

Death with graft failure n = 2 


Fig. 2 | CONSORT diagram. CONSORT diagram depicting the outcome 
for all donor livers enrolled in the trial. ITT, intention to treat; PP, per 
protocol; R&D, research and development. 


Donor, preservation and recipient characteristics 
NMPand SCS donor and recipient groups were well-matched (Tables 1, 2). 
The discard rate was higher in the SCS arm (24.1%; 32 out of 133) 
than the NMP arm (11.7%; 16 out of 137; Extended Data Table 1). 
This difference was statistically significant (—12.4%, 95% confidence 
interval —21.4 to —3.3%; P=0.008). One NMP discard was the result 
of a device malfunction in an already marginal organ (hepatic artery 
hypoperfusion due to pinch valve miscalibration; see Supplementary 
Information). 

Functional warm ischaemia time applies only to DCD livers and was 
measured as the time from the onset of donor hypoxia (oxygen satu- 
ration < 70%) or hypoperfusion (systolic blood pressure < 50 mmHg) 
until the start of cold aortic perfusion in the donor. The median 
functional warm ischaemia time was longer for NMP than SCS livers 
(21 min versus 16 min; P=0.003). 

Total preservation time was measured from the start of cold aortic 
perfusion in the donor until graft reperfusion in the recipient. The 
median total preservation time was longer for NMP than SCS liv- 
ers (11h 54min versus 7h 45 min; P< 0.001). Within the NMP 
arm, there was no significant difference in median perfusion time 
between DBD and DCD livers (9h 55 min DBD versus 8h 45 min 
DCD; P=0.449). 

Post-reperfusion haemodynamics were documented in 218 cases: 
post-reperfusion syndrome was more common in the SCS (32 out of 
97) than the NMP group (15 out of 121), a statistically significant dif- 
ference (-20.6%, 95% confidence interval —-31.6 to -9.5%; P <0.001). 
This was despite reduced requirement for vasopressors in NMP livers 
in the post-reperfusion period (Extended Data Table 2a-c). 
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Table 1 | Donor demographic details 


Table 2 | Preservation and recipient demographic details 


Stratification factors for all 


randomized livers NMP (n=170) SCS (n=164) 
Donor type? 

DBD 107 (62.9%) 104 (63.4%) 
DCD 63 (37.1%) 60 (36.6%) 
Donor demographics for all NMP (n=137) SCS (n=133) 
retrieved livers 

Gender? 

Female 54 (39.4%) 57 (42.9%) 
Male 81 (59.1%) 76 (57.1%) 
Missing 2 (1.5%) 0 (0.0%) 

Age 56 (45-67) (16-84) 56 (47-66) (20-86) 
Ethnicity@ 

African-Caribbean 3 (2.2%) 1 (0.8%) 
Caucasian 131 (95.6%) 128 (96.2%) 
Other 1 (0.7%) 4 (3.0%) 
Missing 2 (1.5%) 0 (0.0%) 
Cause of death 

CVA 74 (54.0%) 74 (55.6%) 
Hypoxia 30 (21.9%) 32 (24.1%) 
Trauma 17 (12.4%) 16 (12.0%) 
Other 14 (10.2%) 11 (8.3%) 
Missing 2 (1.5%) 0 (0.0%) 


27.01 (23.74-30.56) 
(17.24-49.96) 


Body mass index 26.26 (23.66-30.52) 


(16.42-46.65) 


Missing 2 (1.5%) 0 (0.0%) 

ET-Donor risk index® 1.72 (1.47-2.09) 1.72 (1.50-2.10) 
(0.98-4.31) (1.06-3.49) 

Missing 16 (11.7%) 19 (14.3%) 


CVA, cerebrovascular accident. 
@Frequency and column percentages are reported. 
’Median, interquartile range (IQR, first brackets) and full range (second brackets) are reported. 


Peak AST (primary outcome) 
Peak AST during the first 7 days after transplant was reduced by 49.4% 
in the NMP group compared to SCS when adjusted by centre and donor 
type (geometric mean ratio 0.506, 95% confidence interval 0.388-0.659; 
P<0.001). Unadjusted analysis (Student’s t-test) and sensitivity analysis 
undertaken in the per-protocol population confirmed these results. 
Subgroup analysis showed that the effect of NMP was different in 
the two donor types (test for interaction P= 0.012), although it was 
statistically significant in both subgroups; the reduction in geometric 
mean peak AST was greater in DCD (73.3%, 95% confidence interval 
53.7-84.6%; P< 0.001) than in DBD livers (40.2%, 95% confidence 
interval 19.3-55.7%; P=0.001). Subgroup analyses for the model for 
end-stage liver disease (MELD) score and Eurotransplant-donor risk 
index (ET-DRI) showed no statistically significant differences (data not 
shown). See Extended Data Table 3a, b and Extended Data Fig. 1 for 
further analysis. See Table 3 for full outcome results. 


Early allograft dysfunction 

Data to assess early allograft dysfunction (EAD) rates were available in 
216 recipients: the odds of developing EAD in the NMP arm (12 out of 
119) were 74% lower than the SCS arm (29 out of 97; odds ratio 0.263, 
95% confidence interval 0.126-0.550; P < 0.001). A logistic regression 
model adjusted for donor type, MELD score and ET-DRI showed that 
the adjusted odds of EAD in the NMP arm were approximately 72% 
lower than in the cold storage arm (adjusted odds ratio 0.276, 95% 
confidence interval 0.124—0.611; P=0.002). The difference in EAD 
rates was partly a result of the difference in peak AST (described above), 
but also a reflection of differences in bilirubin. The median bilirubin 
level in the first week postoperatively was lower in NMP recipients 
(2.25 mg dl~!, 95% confidence interval 1.23—4.28)) than in the SCS 
group (2.87 mgdl- 1 95% confidence interval 1.52-5.00; P= 0.029). 


Biliary strictures on MRCP 

An MRCP was performed on 155 (81 NMP, 74 SCS) of the 222 trans- 
planted trial patients. There was no significant difference in the rate of 
non-anastomotic strictures for DBD (NMP 7.4% (4 out of 54) versus SCS 
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Preservation details for all 


transplanted livers NMP (n=121) SCS(n=101) Pvalue? 
Functional warm ischaemia time? 21 (17-25) 16 (10-20) 0.003 
(min) (applies to DCD livers; (9-93) (2-32) 
n=55 (34 NMP, 21 SCS)) 
Cold ischaemia time prior to 126 NA 
NMP (min)° (n= 120) (106.5-143.0) 
(49-218) 
Machine perfusion time (min)° 547.5 NA 
(n=120) (372.5-710.5) 
(85-1,388) 
Total preservation time from 714 (542-876) 465 0.0000 
cross-clamp in donor to organ (258-1,527) (375-575) 
reperfusion in recipient (min) (223-967) 
Steatosis assessed 0.366 
pre-preservation?® 
None or mild 91 (75.3%) 89 (88.2%) 
Moderate or severe 29 (24%) 12 (11.9%) 
Missing 1 (0.8%) 
Recipient demographics NMP (n=121) SCS(n=101) Pvalue* 
Gender® 0.717 
Female 35 (28.9%) 27 (26.7%) 
Male 86 (71.1%) 74 (73.3%) 
Donor type? 0.209 
DBD 87 (71.9%) 80 (79.2%) 
DCD 34 (28.1%) 21 (20.8%) 
Age 55 (48-62) 55 (48-62) 0.713 
(20-72) (22-70) 
Cause of liver failure? 0.782 
Alcoholic 36 (29.8%) 29 (28.7%) 
Auto-immune hepatitis 2 (1.7%) 5 (5.0%) 
Hepatitis B 3 (2.5%) 2 (2.0%) 
Hepatitis C 4 (3.3%) 4 (4.0%) 
Hepatocellular carcinoma on 15 (12.4%) 6 (15.8%) 
background of cirrhosis 
Non-alcoholic steato-hepatitis 119.1%) 1(10.9%) 
Primary biliary cirrhosis 10 (8.3%) 3 (3.0%) 
Primary sclerosis cholangitis 18 (14.9%) 3 (12.9%) 
Other 22 (18.1%) 8 (17.8%) 
Body mass index® 26.18 26.94 0.626 
(23.12-32.39)  (24.36-30.42) 
(18.02-50.99) (18.91-42.95) 
Missing 0 (0.0%) (1.0%) 
Retransplant? 12 (9.9%) 8 (7.9%) 0.605 
MELD score‘ (calculated attime 13 (10-18) 4 (9-18) 0.970 
of transplant) (6-35) (6-29) 
UK 13 (10-17) 4 (9-18) 
(6-33) (6-28) 
Essen, Germany 17 (14-19) 5.5 (14-17) 
(13-23) (14-17) 
Barcelona, Spain 16 (8-26) 4 (9-16) 
(8-35) (8-29) 
Leuven, Belgium 19 (13.5-25.0) 16(16-20) 
(13-26) (9-27) 
eGFR° 87.36 92.22 (69.72- 0.928 
(69.61-107.66) 104.24) 
(33.45-156.43) (30.19- 
55.04) 
Missing 4 (3.3%) 3 (3.0%) 
ET-Donor risk index® 1.70 71 0.610 
(1.47-2.07) (1.50-2.01) 
(0.98-4.31) (1.06-3.49) 
Missing 13 (10.7%) 3 (12.9%) 


NA, not applicable. eGFR, estimated glomerular filtration rate. 

\? tests and non-parametric Mann-Whitney U-tests were used for categorical and continuous 
variables, respectively. No adjustment for multiple comparisons were made. 

‘Functional warm ischaemia applies to DCD donors and is measured from the onset of functional 
warm ischaemia (systolic blood pressure <50 mmHg or Op saturation < 70%) to cross-clamp. 
°Median, IQR and full range are reported. 

4Frequency and column percentages are reported. 

Measurement of the degree of steatosis was based on clinical assessment by the retrieval 
surgeon. 


5.4% (3 out of 55); P=0.678) or DCD (NMP 11.1% (3 out of 27) versus 
SCS 26.3% (5 out of 19); P=0.180) livers. Only one patient in each trial 
arm developed clinically relevant evidence of ischaemic cholangiopathy 
in the first year after transplant, both of whom were re-transplanted. 
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NMP (n=121)? SCS (n=101)? Effect (95% Cl)? Pvalue 
Peak AST 
ITTS 
Adjusted 488.1 (408.9-582.8) 964.9 (794.5-1,172.0) 0.5 (0.40.7) 0.0000 
Unadjusted 484.5 (406.4-577.6) 973.7 (795.2-1,192.3) 0.5 (0.4-0.6) 0.0000 
Test for interaction by donor type 0.012 
Subgroup analysis by donor type 
DBD 526.2 (427.3-647.9) 880.2 (708.5-1,093.5) 40.2% (19.3-55.7%) 0.0009 
DCD 389.7 (278.0-546.4) 1,458.1 (944.7-2,250.5) 73.3% (53.7-84.6%) 0.0000 
PP analysis 498.6 (414.8-599.4) 982.9 (810.4-1,192.2) 0.5 (0.40.7) 0.0000 
Secondary outcomes 
Discard rates¢ 6 (11.7%) 32 (24.1%) —12.4% (—21.4 to 0.008 

—3.3%) 
Primary non-function® (0.8%) 0 (0.0%) NA NA 
Post-reperfusion syndrome 5 (12.4%) 32 (33.0%) —20.6% (—31.6 to 0.0002 
—9.6%) 

Post-reperfusion lactate‘ 3.6 (2.6-4.2) 4.1 (3.2-5.0) 0.018 
Early allograft dysfunction 2 (10.1%) 29 (29.9%) 0.263 (0.126-0.550) 0.0002 
Biochemical liver tests‘ (average value over day 1-7) 
Bilirubin (umol I>?) 
Days 1-7 38.5 (21.0-73.2) 49.1 (26.0-85.5) 0.029 
30 days 3.0 (8.0-22.1) 13.0 (9.1-21.0) 0.479 
6 months 9.1 (6.0-15.1) 9.1 (6.0-13.0) 0.671 
AST (IU I>) 
Days 1-7 67.5 (98.0-320.7) 318.5 (152-611.5) 0.0000 
30 days 20 (14-35) 22 (15-40) 0.707 
6 months 23 (18-33) 23 (18-37) 0.931 
4GT (IU 1-4) 
Days 1-7 268.1 (156.3-408.3) 301 (201.1-443.9) 0.157 
30 days 178 (109.5-410.0) 200 (96.0-397.5) 0.949 
6 months 47 (28-144) 47 (26-128) 0.452 
INR 
Days 1-7 1.2 (1.2-1.4) 1.2 (1.2-1.4) 0.644 
30 days 1.1 (1.0-1.2) 1.1 (1.0-1.2) 0.735 
6 months 1.1 (1.0-1.2) 1.1 (1.0-1.1) 0.167 
Creatinine (umol |!) 
Days 1-7 92.8 (60.1-121.1) 97.2 (67.2-143.2) 0.139 
30 days 82.2 (66.3-104.3) 90.2 (72.5-121.1) 0.019 
6 months 99.9 (81.3-117.6) 99.9 (83.1-134.4) 0.265 
Lactate (mmol I~!) 
Day 1-7 1.3 (1.0-1.7) 1.1 (0.9-1.6) 0.130 
Other outcomes 
Need for RRT (number (percentage) of patients) 
Day 1-7 after transplant 26 (21.5%) 19 (18.8%) 2.7% (-7.9 to 13.2%) 0.621 
30 days 27 (22.3%) 20 (19.8%) 2.5 (-8.2 to 13.3%) 0.648 
6 months 27 (22.3%) 21 (20.8%) 1.5% (—9.3 to 12.4%) 0.784 
Duration of RRT day 1-7 4 (2-6) 5 (4-6) 0.346 
Length of hospital stay‘ 15 (10-24) 15 (11-24) 0.926 
Length of ICU stay‘ 4 (2-7) 4 (3-7) 0.339 
Graft survival at 1 year 0.950 (0.893-0.977) 0.960 (0.897-0.985) 0.707 
Patient survival at 1 year 0.958 (0.902-0.982) 0.970 (0.909-0.990) 0.671 


Cl, confidence interval. 


*Total number of livers transplanted and analysed overall. Primary outcome analysed on n= 220 due to unavailability of AST values during the first seven days after transplant. Specific outcomes may 


have different denominators due to some missing data. 


Effect reported is: Percentage reduction (from geometric mean ratio) for peak AST; odds ratio for early allograft dysfunction; difference in proportions (%) for discard rates, post reperfusion syndrome 
and need for renal replacement therapy (RRT); not reported for outcomes for which medians are reported, for survival scores and for tests for interactions of subgroup analysis (only P values are 


reported). 
‘Intention to treat (ITT) analysis was adjusted for donor type and transplant centre. 


4Denominators for the discard rates is the total number of livers retrieved (n = 270 (NMP, n=137; SCS n=133)). 


©Test not performed due to few events and no events in one arm. 
‘Median and IQR are reported, a non-parametric Mann-Whitney U-test was used. 


Similarly, there was no significant difference in the rate of anasto- 
motic strictures for DBD (NMP 40.7% (22 out of 54) versus SCS 41.8% 
(23 out of 55); P=0.909) or DCD (NMP 48.1% (13 out of 27) versus 
SCS 57.9% (11 out of 19); P=0.515) livers. 


Hospital stay, graft and patient survival 
There was no difference in median intensive care unit (ICU) stay 
(4 days NMP versus 4 days SCS; P=0.339), hospital stay (15 days NMP 
versus 15 days SCS; P=0.926) or the need for renal replacement therapy 
in the first postoperative week (2.7%, 95% confidence interval -7.9 to 
13.2%; P=0.621). 

One NMP liver developed PNF (see Supplementary Information). 
There were no PNF cases in the SCS arm. Overall 10 recipients died 


during follow-up, producing a one-year survival of 0.949 (95% confi- 
dence interval 0.890-0.977) in the NMP group and 0.958 (95% confi- 
dence interval 0.902-0.982) in the SCS group (P=0.901). Two deaths 
in the SCS group and three deaths in the NMP group were due to graft 
failure. 

Graft survival at one year was 0.950 (95% confidence interval 0.893 - 
0.977) and 0.960 (95% confidence interval 0.897—0.985) in the NMP 
and SCS groups, respectively (P= 0.695). The causes of graft failure 
in the SCS arm were hepatic artery thrombosis (n =3) and ischaemic 
cholangiopathy (n = 1). The causes of graft failure in the NMP arm were 
hepatic artery thrombosis (n = 2), ischaemic cholangiopathy (n= 1), 
non-thrombotic infarction (n = 1), inferior vena cava occlusion (n= 1) 
and PNF (n=1). (Extended Data Fig. 2a, b). 
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For more detailed analysis of trial outcomes please see Supplementary 
Information. 


Perfusion characteristics indicative of organ quality 

The following continuously monitored parameters (mean + s.d.) by the 
third hour of NMP were measured for all livers that went on to be suc- 
cessfully transplanted (Extended Data Figs. 3, 4). The measured haemo- 
dynamic parameters were: hepatic artery flow (280 + 120 ml min™') 
and portal vein flow (1.11 £0.21 min~'). The measured meta- 
bolic parameters were: pH (7.31 + 0.17) and lactate clearance from 
9.99 + 3.13 mmoll7! at 15min NMP to 0.93 + 0.63 mmoll7! by 4h 
NMP. The measured synthetic parameter consisted of bile production 
(9.17 +11.16mlh7'). Notably, 18 transplanted NMP livers produced 
no/minimal bile during perfusion. All but one of these functioned 
after transplant. There was no correlation between bile production and 
post-transplant liver function or later development of non-anastomotic 
biliary strictures. 

One NMP liver developed PNE. This liver was persistently acidotic 
with lactate > 4mmol for the duration of NMP. No other liver with 
these characteristics was transplanted. 

Following transplant, 28 livers displayed minimal preservation 
injury (MPI; peak AST < 250 IU 1~') and 25 showed evidence of 
severe preservation injury (SPI; peak AST > 1,000 IU1~'). The donors 
in these groups were well-matched in all characteristics other than 
sex (Extended Data Table 4). During NMP, there was a difference in 
baseline perfusate alanine aminotransferase (ALT) (MPI 171 IUI"! 
versus SPI 669 IU 1~!; P=0.005) and lactate dehydrogenase (LDH) 
(MPI 1,073 IU 1"! versus SPI 1,838 IU 1~!; P=0.01) between the two 
groups. Levels of these enzymes, as well as \-glutamyltransferase (yGT), 
increased more rapidly during the first 8h of NMP in the SPI group 
(ALT, an increase of 56 [U1™! versus an increase of 461 IU1~!, P< 0.001; 
LDH, an increase of 483 IU 1~! versus an increase of 980 IU 1~!, 
P=0.06; \GT, an increase of 23 IU 1! versus increase of 104 IU 17}, 
P=0.004). MPI livers showed a reduction in measurable levels of 
haemolysis (haemolysis index) as NMP progressed, in contrast to SPI 
livers in which the levels of haemolysis rose (MPI, a decrease of 0.04 U 
versus SPI, an increase of 0.09 U; P=0.03). Bile production was greater 
in the MPI group (MPI 13.1 mlh~ versus SPI 7.8mlh~!; P=0.03). 
Lactate clearance was similar in each group. Post-reperfusion syndrome 
was less common in the MPI group (MPI 0% (0 out of 28) versus SPI 
24% (6 out of 25); P=0.007). One NMP liver with perfusate transami- 
nases in excess of 20,000 IU 1“! was transplanted successfully. 


Adverse events 

The proportion of patients for whom adverse events were reported 
(Extended Data Tables 5a—c, 6) was similar in the two arms (55.4% 
NMP, 95% confidence interval 46.1-64.4% versus 57.4% SCS, 95% 
confidence interval, 47.2-67.2%) with a larger total number of events 
reported for SCS livers (128 NMP versus 164 SCS). Of these, a greater 
proportion of the serious adverse events (Clavien—Dindo grade >IIIb) 
were in the SCS than NMP arm (16.4% NMP (21 out of 128) versus 
22% SCS (36 out of 164)). No statistical tests were applied to these data. 


Discussion 

To our knowledge, this is the first randomized controlled trial to com- 
pare any type of machine perfusion technology with conventional static 
cold storage in human liver transplantation. 

The trial demonstrated significant reductions in peak AST and EAD 
rates in NMP livers; this is of clinical relevance as both are clinically 
accepted biomarkers for long-term graft and patient survival!*!%. 
These benefits are consistent with previous animal work"® and the 
phase-I clinical study? that preceded this trial, both of which showed 
post-transplant AST reductions in NMP livers. No differences were 
seen in graft or patient survival: a much larger trial is required to test 
this outcome. It is notable that these reductions in peak AST and EAD 
rates were achieved in the context of improved organ utilization and 
longer preservation times, both of which have implications in terms 
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of addressing the donor shortage and logistical barriers that currently 
limit liver transplants. 

DCD donors represent a largely untapped source of organs, com- 
prising 42% of UK deceased donors, but only 21% of transplanted 
livers’. Utilization of DCD livers is limited by poorer outcomes (PNF 
and ischaemic cholangiopathy) compared with DBD livers. Allowing 
the limitations of small group analyses, in this study NMP DCD liver 
primary outcome data were superior to those of both DCD and DBD 
livers preserved using SCS. In fact, the primary outcome of DCD NMP 
livers was superior to that of DBD livers preserved by NMP: this was 
possibly owing to a selection bias, both of donors (lower threshold 
to decline DCD donors) and recipients (fitter patients selected for 
higher-risk organs). The AST differences are in the context of longer 
functional warm ischaemia times, longer preservation times and fewer 
organ discards in the NMP arm, suggesting that NMP may be achieving 
the desired objective of increasing organ utilization without compro- 
mising outcome. If these findings were translated into clinical practice, 
the increase in organ utilization would have substantial implications 
for waiting list mortality, which is currently approximately one in five 
patients!. 

The longer preservation times in the NMP group were not planned, 
but were all within the maximum perfusion time defined in the pro- 
tocol. There was no stipulation in the trial protocol that the preser- 
vation times should be matched. As clinicians gained experience, it 
appeared that some centres had started to organize their operating 
schedule according to the preservation method, although no overall 
difference between arms was seen in the proportion of transplants 
occurring in daylight hours. If, as appears to be the case, NMP can 
safely extend preservation times without compromising outcomes, this 
will have implications for operating department planning as well as 
organ utilization. 

There were over 50% fewer discarded organs in the NMP group, 
resulting in 20% more transplanted livers (121 NMP versus 101 SCS). 
The SCS discard rate of 23.7% was higher than the 17% reported in 
UK registry data‘, and may reflect the high proportion of DCD livers 
enrolled in the trial; the discard rate of retrieved DCD livers in the UK 
is 30%“. This reported difference in organ utilization is likely to be an 
underestimation of the full potential impact that NMP could have on 
transplant numbers. The trial stipulated that only livers considered 
transplantable according to standard practice could be enrolled. For 
the full extent of improved organ utilization to be measured, livers 
would need to be randomized to NMP or SCS before being offered for 
transplant; this should form the basis of a future study. An increase of 
20% or more in the number of transplantable donor livers would have 
a transformative effect on the mortality on liver transplant waiting lists 
around the world. 

The haemodynamic characteristics of the NMP recipients following 
reperfusion were measurably superior to those of SCS recipients, in line 
with previously reported findings’>. This did not translate into a differ- 
ence in ICU stay, hospital stay or need for renal replacement therapy 
between the two groups, despite previous reports showing a correlation 
between peak AST and renal replacement therapy'®. The magnitude of 
the reperfusion syndrome is a factor in determining the eligibility of the 
sickest patients for high-risk organs, due to the limited capacity of such 
patients to tolerate cardiovascular instability; NMP might therefore 
increase the options for the most urgent patients. 

Perhaps the greatest limitation to more widespread utilization of 
DCD livers is the high rate of clinically important non-anastomotic 
biliary strictures (NAS) which lead to a high rate of graft failure; this 
is believed to develop due to the vulnerability of the biliary tree to 
prolonged warm ischaemia. The rate of NAS in the NMP DCD group 
(11.1%) was lower than in SCS (26.3%) livers, despite longer functional 
warm ischaemia times. This did not reach statistical significance, which 
may be a function of sample size; the trial was not powered for this 
outcome. Reported rates of NAS in DCD transplants vary from 10 
to 30%'”'8, but these are in patients with symptoms suggesting bil- 
iary pathology, rather than those only apparent on imaging; biliary 
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investigations are usually only performed for clinical indications 
(typically deranged liver function). Prior to this study, the radiolog- 
ical incidence of both anastomotic and non-anastomotic strictures in 
asymptomatic patients was unknown; in particular, there is no real 
benchmark against which to compare the rate of NAS seen in the DCD 
SCS group. Apart from the two patients retransplanted for ischaemic 
cholangiopathy, almost all of the remaining patients with radiological 
evidence of NAS had normal liver function at one year; this questions 
the clinical relevance of a protocol MRCP at six months. The longer- 
term follow-up of these patients will shed light on the importance of 
a radiological diagnosis of biliary stricturing in patients with normal 
graft function, and the role of MRCP as an endpoint in future trials. 

As well as demonstrating improved graft preservation, this trial 
tested the feasibility, usability and safety of NMP, a vital component of 
the evaluation of any new technology. It showed that the logistical chal- 
lenges of NMP can be met successfully within clinical practice. Over 
120 NMP livers were transplanted in seven transplant centres across 
four European countries. Nonetheless, adoption of this technology 
into clinical practice may necessitate changes in the organ retrieval 
process, particularly with respect to technical support and transport 
arrangements. It remains to be seen whether NMP is required for the 
full duration of an organ’s preservation or can equally well be applied 
after a short period of SCS when the organ reaches the transplanting 
centre—this would simplify the logistics but may not be suitable for 
the most marginal organs!’. A phase-II study to test this has recently 
completed enrolment in the UK (NCT03176433). 

For this new technology to be supported by healthcare funders, a 
health-economic case is needed. The results of this study suggest that 
benefits will accrue not only from improved early graft function and 
transplantation logistics, but also from improved utilization. Secondary 
economic benefits will accrue from logistic changes, enabling trans- 
plants to be moved predominantly into daytime operating, with reduc- 
tion in staffing costs and likely improvements in outcome. More timely 
intervention will also bring economic benefits—earlier transplantation 
is associated with lower morbidity and cost. 

The effects of NMP demonstrated in this study are unequivocal with 
respect to the primary endpoint, implying a benefit in livers currently 
used for transplantation. However, the greatest benefit may be realized 
by applying this technology to livers outside current acceptance crite- 
ria, in order to transplant organs currently deemed untransplantable. 
Algorithms to assess organ viability, based on data obtained during 
NMP, will be essential if this potential is to be realized'°. This study 
sheds some light on which perfusion parameters may be used to assess 
organ quality: bile production, acid-base stability, lactate clearance, 
perfusate transaminase levels, falling measurable haemolysis—all 
correlate with the degree of preservation injury evident after trans- 
plant. However, all but one of the livers transplanted in the NMP group 
functioned postoperatively, including one NMP liver with perfusate 
transaminases in excess of 20,000 IU 17! and 18 livers with minimal 
bile production. Data from much larger numbers of NMP transplants 
(typically from a registry) would be required to determine specific 
markers of viability. 

The importance of bile production during NMP is unclear. 
Preliminary evidence from our group” suggests that preservation 
injury causes impaired hepatocellular uptake of bile salts. We have 
shown evidence of progressive accumulation of bile salts in the perfu- 
sate of livers with high post-transplant transaminase levels; something 
that also correlates with poor bile production during NMP. The extent 
and nature of the injury required to produce this effect is not clear but 
does appear to reflect organ quality rather than viability. 

High-risk organs (for example, those with steatosis) may benefit from 
therapeutic interventions delivered during NMP: several groups are 
exploring potential strategies, including stem cell treatments, de-fatting 
agents and immunological modification of the organ. Future trials 
may be needed to formally test the size of the effect of NMP on organ 
utilization; for this it will be necessary to randomize livers at the time 
of organ offering rather than the time of retrieval. Organ utilization, or 
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organ utilization with 12-month graft survival (functional utilization) 
would be a logical primary endpoint for a study of this sort. 

This study describes the formal clinical evaluation of a novel tech- 
nology in liver transplantation, and could herald the start of a new era 
of intervention during organ preservation. It represents a first, neces- 
sary step in demonstrating that NMP is feasible, safe and effective in 
clinical practice; the fact that the study has definitively met its primary 
endpoint should now enable the exploration of the technology’s wider 
potential. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0047-9. 
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METHODS 


Study design. This investigator-led, multinational, open-label, two-arm, paral- 
lel randomized controlled trial included seven liver transplant centres from the 
UK (Addenbrooke's Hospital, Cambridge; King’s College Hospital, London; 
Queen Elizabeth Hospital, Birmingham; Royal Free Hospital, London), Belgium 
(Universitaire Ziekenhuizen, Leuven), Spain (Hospital Clinic de Barcelona, 
Barcelona) and Germany (Universitatsklinikum, Essen), and was part of the 
EU-funded Consortium for Organ Preservation in Europe (COPE, http://www. 
cope-eu.org/). Approval was obtained from national research ethics commit- 
tees and medical device regulatory bodies in each trial region, in particular the 
London—Dulwich National Research Ethics Committee (NREC) and the Medicines 
and Healthcare Regulatory Agency (MHRA) in the UK; the Federaal Agentschap 
voor Geneesmiddelen en Gezondheidsproducten (FAGG) and the Commissie 
Medische Ethiek of Universitaire Ziekenhuizen, Leuven, Belgium; the Comité 
Etico de Investigacion Clinica of the Hospital Clinic de Barcelona; the Deutsche 
Arztekammer and the ethics committee of University Hospital Essen, Germany. 
The trial protocol was registered before recruitment (ISRCTN 39731134). All rel- 
evant ethical regulations relating to the conduct of this study were followed at 
each trial site. The trial is reported in accordance with the CONSORT statement”). 

No major amendments were made to the trial design after the start of recruit- 
ment. 

Eligibility and consent. Inclusion criteria for donors and recipients were deliber- 
ately broad to represent the full spectrum of clinical practice. Whole livers from 
DBD and DCD (Maastricht category III**) donors at least 16 years of age were 
eligible. Specific donor consent was not required for trial inclusion. No organs 
were procured from prisoners. Recipients were eligible provided they were at least 
18 years old and listed for a liver-only transplant, excluding those with fulminant 
liver failure, owing to the poor prognosis of this group regardless of organ quality. 
Potential participants were consented while on the waiting list; consent was 
affirmed on the day of transplantation. The consent included the recording of 
anonymized data for trial purposes and the collection of biological samples for 
storage in the trial biobank (see ‘Sample collection’). No patient identifiable data 
were collected. 

Randomization. Once an eligible donor organ was allocated to a consented recip- 
ient and the availability of the NMP device and team was confirmed, the liver 
was randomized. All clinical decisions thereafter, including graft suitability and 
procedure scheduling, were made independently of the trial team. 

Using an online randomization tool, livers were assigned to NMP or SCS with 
1:1 allocation ratio as per a computer-generated randomization schedule, using 
variable block size, stratified by transplant centre and donor type (DBD/DCD). 
The unit of randomization was donot livers rather than recipients, but analysis is 
reported for the transplant recipients. 

Static cold storage group. Livers randomized to SCS were retrieved, preserved, 
transported and transplanted according to local standard practice. 
Normothermic machine perfusion group. The OrganOx metra normothermic 
liver perfusion device was used (Extended Data Fig. 5a), which enables automated 
organ preservation for up to 24h. Following randomization to NMP, the device and 
accompanying researcher were transported to the donor hospital. The device was 
set-up during the retrieval procedure, as has been previously described’. A sterile 
disposable set was installed on to the device and primed with 500 ml gelofusine 
(B. Braun Ltd) and three units of donor-matched packed red blood cells. Antibiotics 
were given at the outset and heparin, insulin, prostacyclin, bile salts and fat-free 
parenteral nutrition were infused during the perfusion (Extended Data Fig. 5b). 

Following retrieval of the donor organ, and while still at the donor hospital, the 
liver back-table operation was performed’, followed by cannulation of the hepatic 
artery, portal vein, inferior vena cava and bile duct. The liver was connected to the 
NMP device and perfusion commenced (Fig. 1). During the early part of the per- 
fusion sodium bicarbonate was added incrementally to achieve a physiological pH. 
The OrganOx metra perfusion device incorporates online blood gas measurement 
(Terumo CD1-500) together with software-controlled algorithms to control Po, 
and Po, (within physiological limits), temperature (37°C), mean arterial pressure 
(65-75 mm Hg) and), inferior vena cava pressure (0-2 mm Hg). Typical blood 
flows of 200-400 ml min! (artery) and 1,000-1,200 ml min“! (portal vein) were 
obtained. Glucose was measured manually and the value entered into the device. 
If glucose fell below 10 mmol]! this automatically triggered the infusion of a fat- 
free TPN mixture (Nutriflex Special, B. Braun Ltd) into the perfusate. 

NMP continued throughout the duration of transport and storage until the 
transplanting team were ready to implant the liver. The minimum protocol-stipulated 
NMP duration was 4h, the time needed for ATP repletion in animal studies". 
The maximum allowed NMP duration was 24h in line with the experience in the 
phase-I study and the regulatory approval for the device’. 

Sample collection. Tissue biopsies (donor liver and bile duct), recipient blood 
and urine were collected at pre-specified time points from every liver/transplanted 
patient in the study. In addition to these, samples of perfusate fluid and bile were 
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collected from every NMP liver. These were stored in a central biobank established 
by the COPE Consortium for use in ongoing mechanistic studies. Each sample was 
allocated a unique bar code, which the biobank coordinator was able to match to 
a specific trial identification number. No patient identifiable data were associated 
with each sample. 

Study end points. The primary endpoint was defined as the difference between the 
two treatment arms in the peak level of serum AST within seven days after trans- 
plant. This is a clinically accepted biomarker, predictive of primary non-function 
as well as graft and patient survival!?4 and is also associated with histological 
evidence of moderate to severe perfusion injury’>”®. 

A surrogate marker of graft survival was used in this trial for two reasons: (1) 
the relatively high survival rates in liver transplantation (> 90%) and (2) the mul- 
tifactorial causes of graft loss. A trial based directly on graft or patient survival 
would have had to be unfeasibly large. 

In order to ensure consistency and to minimise the hypothetical AST ‘wash-out’ 
effect in the NMP-treated organs, the first post-transplantation value was measured 
between 12 and 24h after reperfusion. 

Secondary end points included: (1) organ discard rate (after retrieval); (2) 
post-reperfusion syndrome”’: > 30% drop in mean arterial pressure persisting 
for > 1 min within five minutes of reperfusion; (3) primary non-function: irre- 
versible graft dysfunction, for non-technical and non-immunological causes, lead- 
ing to death or emergency liver replacement during the first 10 days after liver 
transplantation; (4) early allograft dysfunction! as indicated by any one of the 
following clinical indicators: (i) bilirubin > 170 jzmol 17! on day 7 after transplant; 
(ii) INR > 1.6 on day 7 after transplant; (iii) peak-AST > 2,000 IU 1“! during the 
first 7 days; (4) length of hospital and ICU stays; (5) need for renal replacement 
therapy; (6) evidence of cholangiopathy on MRCP at six months; (7) graft and 
patient survival at one year. 

Full details of all secondary outcomes are available in the trial protocol”®. 

Six-month MRCP. An MRCP scan was performed six months (range 
5-7 months) after transplant to evaluate the biliary tree for features of cholangiopathy 
evident by biliary strictures. All scans were reviewed by two independent radio- 
logists blinded to the method of organ preservation with disparities adjudicated 
by a third radiologist. Owing to the lack of any existing grading system for biliary 
strictures, a system was agreed to in advance by consensus among the radiolo- 
gists to allow definitive categorization of the presence and site of strictures. The 
findings were reported as follows: (1) normal biliary tree; (2) anastomotic stric- 
ture (> 70% of luminal diameter); (3) unequivocal evidence of non-anastomotic 
stricture anywhere in the biliary tree; (4) both anastomotic and non-anastomotic 
biliary strictures. 
Statistical analysis. Previous data from Universitaetsklinikum Essen, Germany 
(A.Pau. & S.R.K., unpublished observations), demonstrated the geometric mean of 
peak AST to be 608.59 IU I’ in patients transplanted following SCS. The present 
study was powered to detect a (clinically relevant) 33% reduction in peak AST 
with 90% power at a 5% significance level, requiring 220 transplanted livers (110 
per arm). 

Results are reported as a modified intention-to-treat analysis. A per-protocol 
sensitivity analysis was also performed excluding livers that received machine per- 
fusion outside the protocol specified range (4-24h) and comparing the groups 
according to the treatment actually received. Livers randomized but not retrieved 
were excluded from the analysis. 

Primary outcome was analysed using ANOVA with adjustment for stratification 
factors. The peak AST was calculated for each recipient with at least two values 
available. Missing AST values were not imputed. Binary outcomes were assessed 
using test for proportions or logistic regression to adjust for potential confounders 
and report odds ratios. Continuous outcomes were compared using a Student’s 
t-test, if normally distributed, or by Mann-Whitney U-test otherwise. Time-to- 
event outcomes were analysed using Kaplan-Meier estimates and log-rank tests. 
Outcomes are reported with 95% confidence intervals and P values to three decimal 
places. P <0.05 was regarded as statistically significant. 

Pre-specified subgroup analyses were performed for donor type (DCD versus 
DBD), donor risk index (ET-DRI) and MELD score using tests for interaction and 
reported using forest plots. Interaction methods were used to look for consistency 
of treatment effect across the different subgroups and reported using forest plots. 
The study was not powered to detect differences in the subgroups; these results 
should only be regarded as hypothesis-generating. 

Analyses were conducted using Stata version 14.2 (StataCorp). 

No formal interim analyses of end points were carried out. At regular intervals, 
an independent Data Monitoring Committee reviewed confidential reports cov- 
ering recruitment, safety parameters and primary end point data. 

Full details of the statistical methodology are available in the Supplementary 
Information. 

Machine perfusion parameters. During NMP continuous displays of pressures, 
flows, metabolic (pH) and synthetic (bile production) liver function were available 
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to the operator. In addition, lactate measurements were carried out using external 
blood gas analysis. The trial protocol did not stipulate the manner in which these 
parameters should be interpreted. 

Once trial recruitment was complete, an ad hoc analysis was performed in which 

NMP organs were categorized according to those which, following transplantation, 
displayed minimal preservation injury (MPI; peak AST < 250 IU I~!) and those 
with severe preservation injury (SPI; peak AST > 1,000 IU1"!). Groups were com- 
pared for differences in donor and recipient characteristics, perfusate biochemistry, 
bile production and evidence of post-reperfusion syndrome. 
Adverse events. Reporting of adverse events was in accordance with the European 
Commission MEDDEV guidelines”. Following trial completion, these were 
reviewed by two independent clinicians blinded to the treatment arm. Adverse 
events with a Clavien-Dindo™ grading greater than IIIa were considered serious 
adverse events. Rates of adverse events are reported with 95% confidence intervals. 
No statistical tests were applied to these data. 

Full details of the trial methodology are available in the clinical trial protocol”*. 

Reporting Summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 
Data availability. The data that support the findings of this study are available from 
the corresponding author upon reasonable request. The full trial protocol, statis- 
tical analysis plan and final statistical report are available in the Supplementary 
Information. 
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Geometnc mean 


Obs ratio (95% Cl) 
DBD 167 aan 0.598 (0.443, 0.807) 
DCD 53 oe 0.267 (0.154, 0.463) 
Overall 220 <> 0.493 (0.378, 0.641) 


1 
Favours NMP Favours SCS 


Extended Data Fig. 1 | Forest plot for subgroup analysis of peak AST by donor type. Geometric mean ratio and 95% confidence interval are reported 
for each subgroup and overall for all groups. DBD group, n = 87 NMP, n=80 SCS; DCD group, n = 33 NMP, n=20 SCS. 
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Extended Data Fig. 2 | Post-reperfusion syndrome. a, Kaplan-Meier plot for one-year survival of patients with two-sided log-rank test. b, Kaplan— 
Meier plot for one-year graft survival with two-sided log-rank test. 
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2 i 
i 
# 
:° 
: 


Time Point Mean flow 
(litres/min) 


_End of perfusion | ( 


First hour 7.178 
Second hour 7.301 
Third hour 7.313 


End of perfusion | 7.347 0.150 


Extended Data Fig. 3 | Machine perfusion parameters during NMP. 
a, Hepatic artery flow during NMP. b, Portal vein flow during NMP. 
c, Perfusate pH during NMP. d, Bile production during NMP. 
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Time Point Mean flow | Standard 
(litres/min) | Deviation 
First hour 1.118 0.205 
Second hour 1.118 0.202 
Third hour 1.124 0.206 
_Endof perfusion 1.008 0.160 


Ate Flow Bate (iyo) 


Therd Hour tnd of Perfuson 


Mean flow Standard 
mi/hour Deviation 
‘Firsthour |O OO s 
|Secondhour _| 3.486 5.358 
Third hour 9.166 11.160 


End of perfusion | 10.050 | 11.036 


a-d, Data are mean + s.d. of each time point. Actual values are shown in 


the table. n= 87. 
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Extended Data Fig. 4 | Perfusate lactate levels during NMP. Scatter graph with trend line showing perfusate lactate levels at different time points during 
NMP for all transplanted livers. n= 94. 
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Extended Data Fig. 5 | NMP device and circuit. a, OrganOx metra perfusate passes, via a heat exchanger/oxygenator, to a reservoir or directly 
(generation 1). The NMP device used in the trial. b, OrganOx metra NMP __into the hepatic artery. The perfusate in the reservoir drains under gravity 
circuit. The liver is perfused via the hepatic artery and portal vein. It into the portal vein. 


drains via the inferior vena cava to a centrifugal pump through which the 
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Extended Data Table 1 | Detailed breakdown of reasons for discard of NMP livers 


Discarded | Device | Device | Poor Donor Door Poor In- | Prolonged | Liver Steatosis | Further Details 
LiverNo | Error User Perfusion Malignancy | Cirrhosis | situ Donor Size 
Error Parameters Perfusion | Warm 
Ischaemia 

1 N Y N N N N N N Y Poor perfusion due to IVC cannula 
positioning. Safely converted to cold 
storage and discarded due to steatosis. 

2 N N N 7 N N N Y Appearances consistent with cirrhosis in 
donor with known hepatitis C. 

3 N Y N N N N N N Poor hepatic artery flow during NMP 
with increasing lactate. 

4 N N Y N N N N N Incidental lung tumour found at retrieval 

5 N N N N N Y N Y Warm ischaemia time greater than 30 
minutes in DCD donor. 

6 N N N N Y Y N Y Warm ischaemia time greater than 30 
minutes in DCD donor. Poor in situ cold 
perfusion. 

7 N aa Y N N N N N Y Persistently raised lactate >6mmol after 
6 hours NMP. 

8 N N ¥ N N N N Y Colonic tumour found at retrieval. 

9 N N N N N N N ¥ 60% steatosis on biopsy. 

10 N Y N N N N N N Persistent acidosis with lactate >6mmol 
after 8hours NMP. 

11 N N N N N N Y Y Large steatotic liver, no size-matched 
recipient found. 

12 N Y N N N N N Y Persistently raised lactate >3mmol with 
acidosis. 

13 N N N N N N N N Y Moderate steatosis on biopsy, surgeon 
decision to discard. 

14 N N Y N N N N N 4 Poor hepatic artery and portal vein flow 
during NMP in a steatotic liver. 

15 N Y N N N N N N b 4 Excessive bleeding during NMP from 
phrenic veins and hepatic artery in a 
steatotic liver. Safely converted to cold 
storage and declined due to steatosis. 

16 Y N N N N N N N Y See Supplementary Information for 
narrative description of device error. 
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Extended Data Table 2 | Post-reperfusion syndrome analysis 


a 

syndrome 
(No ——|_—-106 (87.6%) | 65 (67.0%) | 
Yes | ~—15(12.4%) | 32 (33.0%) 


Difference = -20.6% (95% C.l. -31.6%, -9.5%) p = 0.000 


lactate 
Median 3.6 4.1 0.018 
toy” | aoa | aso | 


Requiring pre-reperfusion vasopressor 92 (76.0%) 82 (81.2%) 
infusion 
missing 2 (1.7%) 6 (5.9%) 


(missing) 4 (3.3%) 9 (8.9%) 
Requiring post-reperfusion 65 (53.7%) 80 (79.2%) 
vasopressor infusion 

(missing) 1 (0.8%) 9 (8.9%) 


a, Post-reperfusion syndrome by treatment group. Frequencies and column percentages are reported. Difference in proportions was tested using a Fisher's test for proportions. b, Difference in 
post-reperfusion lactate in the recipient in each treatment arm. This relates to the first lactate measurement recorded by the anaesthetist after liver reperfusion and occurred within 30 min of reperfu- 
sion. Analysis using non-parametric Mann—Witney U-test. IQR, inter-quartile range. c, Difference in use of vasopressor medications before, during and after liver reperfusion in the recipient. Percentage 
of total events are reported in brackets. Details of the specific vasopressors that were used were not recorded. 
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Extended Data Table 3 | Extended primary outcome analysis 


a 
NMP SCs Difference / Mean ratio” 
N=120 N=100 [% reduction] 
6.191 6.872 0.681 
Mean In Peak AST (95% C.l) (6.013, 6.368) (6.678, 7.066) (-0.946, -0.417) 
Geometric Mean Peak AST 488.142 964.934 0.506 (0.388, 0.659) 
(95% C.l.) (408.856, 582.804) (794.471, 1171, 972) [49.4% (34.1%, 61.2%)] 
b 


pOReCty pS a lini : . [95% Conf. Interval] p-value 
(geometric mean ratio) 
DBD 167 (0.598 © _ (0.443,0.807) 0.001 


DCD 53 0.267. ~~ (0.154, 0.463) 0.000 


a, Primary outcome results from the unadjusted analysis. Sample size for analysis of primary outcome is n= 220. A Student's t-test was used. “First cell in this column refers to the mean difference in 
natural logarithm of peak AST (variable used to run the analysis models). The second cell in this column refers to the geometric mean ratio of the peak AST, used to look at the reduction in the original 
measurement. b, Treatment effect on peak AST for donor type subgroups. Sample size for subgroup analysis is n= 220 (DBD group, n=87 NMP, n=80 SCS; DCD group, n=33 NMP,n=20 SCS).A 
Student’s t-test was used. No power calculation or adjustment was made for subgroup analysis. 
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a 


NMP duration (mins) 592.8 + 51.10 
Donor Type (DBD/DCD) 19/9 

Donor Age (years) 56.2 + 3.42 
Donor Sex 9M; 19F 
ET-DRI 1.83 + 0.22 
Recipient Age $2.22 2.11 
Recipient Sex 17M; 11F 
MELD 15.0 + 1.09 


633.7 + 52.49 


20/5 

51.0 + 3.02 
18M; 7 F 
1.76 + 0.16 
56.0 + 2.77 
19M; 6F 
13.5 + 1.28 
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Extended Data Table 4 | Characteristics and perfusate analysis of livers included in NMP liver quality model development 


3.44 13.42 5.02 5.02 18.9+ 
0.19 | 0.23 0.34 0,34 1.99 
2 (2) | 2 (2-10) 2 (2-28) 2 (2- 
108) 
ALT 170.5] 669 193.5{78- [570 268.5 
(64- | (58- 2306) (78- 
1811)] 4390) 4809) 
7 (S- 
Alkaline 21) 7 (5-32) | >0.999 7 (5-45) 10 (5- 
Phosphata 
se 105) 
4,5 (4- 5 (4-103) 108 (8- 0.008 23 (0 104 (4- 0.0039 
GGT 92) 759) 183) 667) 
1073 
LOH : 1838+ 1884 + 14792 2610+ 482.82 | 980.14 
180 | 245 207.8 157.2 187.6 110.6 278.7 


CRP 8.3+]862 22.1% 147.4% 131.92 189.2+ 
0.87 | 1.47 2.83 24.39 27.28 42.74 
10.5 
z 10.7 + 3.02 0.Sz 0.9+ 8.6 
0.61 [0.78 0.78 0.09 0.29 0.64 


a, Demographic data for organ quality model development livers. Demographic data for minimal preservation injury (MPI; peak AST < 250 IU |>!; n=28) and significant preservation injury (SPI; peak 
AST > 1,000 IU I-!; n=25) groups. Continuous variables were analysed using an unpaired Student's t-test and categorical variables using Fisher's exact test. Data are mean-+s.e.m. b, Comparison of 
NMP perfusate analyses between MPI (peak AST < 250 IU |-1; n=28) and SPI (peak AST > 1,000 IU I-!; n= 25) groups. A D’Agostino-Pearson normality test was performed to assess data distribution. 
Parametric data were analysed using an unpaired Student's t-test and non-parametric data were analysed using a Mann-Whitney U-test. Parametric data are presented as mean+s.e.m. and 
non-parametric data are presented as median and range. 
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Extended Data Table 5 | Adverse events analysis 


a 

Patients with Total 
avevertetaperes 54 (44.6%) 43 (42.6%) 97 (43.7%) 
Adverse events 67 (55.4%) 58 (57.4%) 

(95% C.l.) (46.1%, 64.4%) (47.2%, 67.2%) iad aime 


Clavien-Dindo Total 
grading 
I 45 
(11.7%) (18.3%) (15.4%) 
il 64 72 136 
(50.0%) (43.9%) (46.6%) 
Ila 28 26 54 
(21.9%) (15.9%) (18.5%) 
IIIb 8 17 
(6.3%) ; (5.8%) 
Va 5 20 
(3.9%) : (6.9%) 
IVb 3 12 
(2.3%) ‘ (4.1%) 
Vv 5 8 
(3.9%) ‘ (2.7%) 
Total 292 
Cc 
Classification Total 
AE 107 128 235 
(83.6%) (78.1%) (80.5%) 
SAE 21 36 57 
(16.4%) (22.0%) (19.5%) 


Total 128 164 292 


a, Number of patients with any adverse events reported in each trial arm. The percentage of total events is reported in brackets. No statistical tests have been applied. b, Adverse events were catego- 

rized by Clavien-Dindo grade. Breakdown of adverse events in each trial arm according to Clavien—Dindo grading. The percentage of total events is reported in brackets. Adverse events with Clavien— 
Dindo grading >Illb were categorized as serious adverse events. No statistical tests have been applied. c, Breakdown of adverse events and serious adverse events in each trial arm. The percentage of 
total events is reported in brackets. Adverse events with Clavien—Dindo grading >Illb were categorized as serious adverse events. No statistical tests have been applied. 
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Extended Data Table 6 | Detailed breakdown of adverse events in each trial arm 


Event Category NMP | SCS | Total 


Infection 
Chest 
Blood 
Biliary 
Abdominal 
Gastrointestinal 
Other 

Hepatic 
Bile leak 
Biliary stricture (anastomotic) 
Ischaemic cholangiopathy 
Biliary other 
Drainage of ascites 
Hepatic artery aneurysm 
Hepatic artery thrombosis 
Hepatic artery stenosis 
Hepatic artery other 
Hepatic vein thrombosis 
Portal vein thrombosis 
Portal vein stenosis 
Portal vein other 
Graft dysfunction 
Rejection 
Other 

Cardiovascular 
Congestive heart failure 
Myocardial infarction 
Other 

Dermatologic 
Seroma 

Gastrointestinal 
Colitis 
Diarrhea 
Other 

Genitourinary 
Renal insufficiency 
UTI 
Other 

Respiratory 
Cold/flu 
Pneumonia 
Shortness of breath 
Other 

Bleeding complications 


Bleeding — no transfusion required 


25 (19.5%) 
1 


44 (34.4%) 


WHwrPnnrPounoorron 


5 (3.9%) 


2 
1 (0.8%) 
1 
5 (3.9%) 
0 
3 
2 
8 (6.3%) 


Hemorrage (Bleeding requiring transfusion) 


Bleeding from hepatic artery 


Bleeding from liver parenchyma 


Other 
Fluid Collection 
Abdominal 
Pleural 
Other 
Device error 
Device user error 
Other systemic diseases 


1 (0.8%) 
2 (1.6%) 
17 (13.3%) 


The percentage of total events is reported in brackets. No statistical tests have been applied. 


17 (10.4%) 42 (14.4%) 
2 


13 


48 (29.3%) 
1 
11 


NOOCOCONWH PRP OW 
URPNNPNDADP HR S 


ra 
w 
nN 
w 


7 10 

5 (3.1%) 10 (3.4%) 
0 
3 
2 

0 (0.0%) 1 (0.3%) 
0 1 

6 (3.7%) 11 (3.8%) 


1 
2 
3 

17 (10.4%) 
13 
3 
1 

9 (5.5%) 


18 (11.0%) 
10 


1 (0.3%) 
2 (0.7%) 
55 (18.8%) 
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SAMHD1 acts at stalled replication forks 
to prevent interferon induction 


Flavie Coquel/!?, Maria-Joao Silval!’, Hervé Técher?, Karina Zadorozhny?, Sushma Sharma’, Jadwiga Nieminuszczy°, 
Clément Mettling®, Elodie Dardillac’, Antoine Barthe!, Anne-Lyne Schmitz!, Alexy Promonet!, Alexandra Cribier®, 
Amélie Sarrazin’, Wojciech Niedzwiedz°, Bernard Lopez’, Vincenzo Costanzo’, Lumir Krejci*!°, Andrei Chabes’, 


Monsef Benkirane®, Yea-Lih Lin!“* & Philippe Pasero!3* 


SAMHDI1 was previously characterized as a dNTPase that protects cells from viral infections. Mutations in SAMHD1 
are implicated in cancer development and in a severe congenital inflammatory disease known as Aicardi-Goutiéres 
syndrome. The mechanism by which SAMHD1 protects against cancer and chronic inflammation is unknown. Here we 
show that SAMHD!1 promotes degradation of nascent DNA at stalled replication forks in human cell lines by stimulating 
the exonuclease activity of MRE11. This function activates the ATR-CHK1 checkpoint and allows the forks to restart 
replication. In SAMHD1-depleted cells, single-stranded DNA fragments are released from stalled forks and accumulate in 
the cytosol, where they activate the cGAS-STING pathway to induce expression of pro-inflammatory type I interferons. 
SAMHD1 is thus an important player in the replication stress response, which prevents chronic inflammation by limiting 
the release of single-stranded DNA from stalled replication forks. 


SAMHD1 (sterile alpha motif and HD domain-containing protein 1) 
is a dNTPase that, in quiescent cells, restricts infection by HIV-1 and 
other viruses!~*. In cycling cells, this dNTPase activity is inhibited 
by the phosphorylation of SAMHD1 by cyclin-dependent kinases*”. 
Germline mutations in SAMHDI cause Aicardi-Goutiéres syndrome 
(AGS), a rare inflammatory encephalopathy characterized by overpro- 
duction of type I interferons (IFNs)°. Other genes mutated in AGS 
include TREX1, which encodes a 3/—5’ exonuclease that degrades 
nucleic acids in the cytoplasm®’. Cytosolic DNA species that are not 
degraded by TREX] trigger the production of type I IFNs and other 
cytokines through the cGAS-STING cytosolic DNA-sensing pathway’. 
The mechanism by which SAMHD1 inhibits this pathway is currently 
unclear?"!', but it has been proposed to involve an elusive 3’—5’ exo- 
nuclease activity!?*. 

Besides AGS, SAMHD1 is frequently mutated in solid tumours and 
in chronic lymphocytic leukaemia’*'®, Because SAMHDI1 regulates 
intracellular dNTP pools, imbalanced dNTP levels in SAMHD1- 
deficient cells might perturb the progression of replication forks and 
thus increase spontaneous mutagenesis!®!, Moreover, SAMHD 1 
colocalizes with DNA repair foci in cells exposed to genotoxic 
agents!°, suggesting that it may have a more direct role at DNA lesions 
or at stalled replication forks. Fork stalling occurs when cells are 
exposed to genotoxic agents such as hydroxyurea (HU) and campto- 
thecin (CPT), or when they encounter sequences that are intrinsically 
difficult to replicate!®. Single-stranded DNA (ssDNA) accumulates 
at stalled forks and recruits the checkpoint kinase ATR, which in 
turn activates CHK1 to promote fork restart and prevent premature 
entry into mitosis'*!°. Fork restart depends on the degradation of 
nascent DNA strands by MRE11 through a process regulated by 


BRCA2 and known as fork resection?”-*”. Defective fork process- 
ing leads to fork collapse, increased genomic instability and cancer 
development!®!?. 


SAMHD!I prevents release of ssDNA 

To determine whether SAMHD1 has a role at stalled forks that could 
be important to prevent the accumulation of cytosolic DNA, we first 
constructed stable HEK293T cell lines expressing either a short hair- 
pin RNA (shRNA) against SAMHD1 (shSAM) or a scrambled shRNA 
control (shScr; Extended Data Fig. 1a). Confocal immunofluorescence 
microscopy confirmed the presence of cytosolic ssDNA in shSAM cells, 
but at a much lower level than in TREX 1-depleted HEK293T cells, 
used here as positive control (Fig. 1a, b and Extended Data Fig. 1b). 
SAMHD1 depletion also induced type I IFN and other pro-inflamma- 
tory cytokine mRNAs in HEK293T (Fig. 1c), HL116 cells (Extended 
Data Fig. 1c) and HeLa cells (Extended Data Fig. 1d). Importantly, 
both the accumulation of cytosolic ssDNA and the induction of IFN 
genes were substantially increased when shSAM cells were exposed to 
HU to induce replication fork stalling (Fig. 1a, b, d and Extended Data 
Fig. le). Moreover, when newly replicated DNA was labelled with BrdU 
two hours before the addition of HU, this cytosolic DNA contained 
BrdU (Fig. le and Extended Data Fig. 1f), indicating that it comes from 
arrested forks and not, for example, from damaged mitochondria— 
another possible source of cytosolic DNA (Extended Data Fig. 1g, h). 
This induction of IFN genes by SAMHD1 depletion in HeLa, HEK293 
and THPI cells”** is mediated by the cGAS-STING pathway (Fig. 1f 
and Extended Data Fig. 1i, j, n, 0), as is the case in Samhd1 ~~ mice?. It 
also depends on the interferon regulatory factor IRF3 (Extended Data 
Fig. 1k—n), both in the presence or the absence of HU. Together, these 
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Fig. 1 | SAMHD1 prevents accumulation of cytosolic ssDNA and 
induction of type I IFNs in response to replication stress. a, Cytosolic 
DNA (red) in control (shScr) and SAMHD1-depleted (shSAM) HEK293T 
cells exposed for 2 h to 4 mM HU. b, Mean fluorescence intensity (MFI) of 
cytosolic ssDNA per cell (n = 300). c, Expression of IFN and TNF mRNAs 
in shScr and shSAM HEK293T cells (mean and s.d. of three independent 
experiments). d, Induction of IFN and TNF mRNAs in shScr and shSAM 
HEK293T cells exposed to HU as in a. Mean and s.d. correspond to 
technical triplicates, representative of three independent experiments. 

e, Cytosolic DNA and BrdU-labelled DNA in shSAM HEK293T cells 
incubated with BrdU for 2 h then without BrdU and in the presence of 4 
mM HU for 2 h (n = 3). f, Expression of IFN-induced genes ISG15 and 
MX1 in cGAS-knockout (cGAS), STING-knockout (STING) and control 
(ctrl) SAMHD1-depleted HeLa cells. Mean and s.d. of three independent 
experiments are shown. ****P < 0.0001, **P < 0.01, two-sided Mann- 
Whitney test. 


data indicate that SAMHD1 prevents the release of ssDNA from stalled 
forks and aberrant activation of the cGAS-STING pathway. 


SAMHD1 is involved in DNA replication 

To investigate the role of SAMHD1 in DNA replication, we first moni- 
tored its subcellular localization by immunofluorescence microscopy. 
SAMHD1 forms foci in HeLa cell nuclei that colocalize with replication 
sites (Fig. 2a and Extended Data Fig. 2a). By using the iPOND (isolation 
of proteins on nascent DNA) method”, we confirmed that SAMHD1 
is present at replication forks (Fig. 2b and Extended Data Fig. 2b). 
However, SAMHD1 persisted on newly replicated chromatin after a 
thymidine chase, unlike PCNA (Fig. 2b). Samhd1 also interacted with 
nascent DNA in Xenopus egg extracts and was recruited to chromatin 
in response to replication inhibition and DNA double-strand breaks 
(DSBs; Extended Data Fig. 2d-f). 

SAMHD1-depleted cells grew more slowly than control cells 
(Extended Data Fig. 3a, b) and had a longer S phase (Extended Data 
Fig. 3c). To determine whether this was due to slower DNA synthesis, 
we labelled control and shSAM cells with IdU and CldU and moni- 
tored the progression of individual forks by DNA fibre spreading. 
CldU tracks were much shorter in HEK293T and HeLa cells depleted 
of SAMHDI than they were in control cells (Fig. 2c and Extended Data 
Fig. 3d-f), indicating slowed DNA synthesis. Moreover, increased fork 
stalling was detected in shSAM cells by monitoring the asymmetry of 
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Fig. 2 | SAMHD1 colocalizes with replication foci and regulates fork 
progression. a, Immunolocalization of haemagglutinin (HA)-tagged 
SAMHD1 (green) and EdU-labelled replication foci (red) in HeLa 

cells, showing their colocalization (merge) in a DAPI-stained nucleus 
(blue; n = 2). b, iPOND analysis of SAMHD1 and PCNA in HEK293T 
cells. c, DNA fibre analysis of CldU track lengths (n = 250) in shScr 

and shSAM HEK293T cells. d, Intracellular dNTP pools in shScr and 
shSAM HEK293T cells (n = 2). e, Lengths of CldU tracks (n = 250) after 
addition of a balanced mix of nucleosides (+dN) 2 h before IdU and CldU 
labelling. f, Fork progression in shSAM HEK293T cells complemented 
with wild-type SAMHD1 or with phosphomimetic (+T592E) or non- 
phosphorylatable (+T592A) mutants (n = 200). In c, e and f, median track 
lengths are indicated in red. ****P < 0.0001, two-sided Mann-Whitney 
test. NS, not significant. 


sister forks (Extended Data Fig. 3g), even though the density of active 
forks was unchanged (P = 0.33, Extended Data Fig. 3h). Together, these 
data indicate that SAMHD1 promotes normal fork progression. 

To determine whether the replication defects observed in shSAM 
cells were due to imbalanced dNTP pools, we measured dNTP levels in 
shSAM HEK293T cells by HPLC (Fig. 2d and Extended Data Fig. 2c). 
In SAMHD1-depleted cells, dGTP levels were 2.5-fold higher than in 
controls, but dATP levels were only approximately 30% higher and 
dCTP and dTTP levels were unchanged. Moreover, the addition of a 
balanced mix of nucleosides to shSAM cells only partially rescued fork 
speed (Fig. 2e), suggesting that the role of SAMHD1 in S phase is not 
limited to the regulation of dNTP pools. 

SAMHD1 is phosphorylated on T592 by cyclin-dependent kinases 
(CDKs) during $ and G,/M phases of the cell cycle*> (Extended Data 
Fig. 4a). To determine how this phosphorylation affects DNA replica- 
tion, shSAM cells were complemented with phosphomimetic (T592E) 
or non-phosphorylatable (T592A) SAMHD1 mutants (Extended Data 
Fig. 4a). Unlike the T592A mutant, the T592E mutant fully rescued 
slow forks in shSAM cells (Fig. 2f). Because both cell types have similar 
dNTP levels (Extended Data Fig. 4b), these data indicate that phos- 
phorylation of SAMHD1 on T592 promotes fork progression inde- 
pendently of dNTP pools. Interestingly, normal fork progression was 
also restored to shSAM cells by the expression of the dNTPase-deficient 
K312A mutant but not by the dNTPase-proficient Y315A mutant! 
(Extended Data Fig. 4a, c). Fork progression was also impaired in 
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nascent DNA at stalled forks. a, shScr and shSAM HEK293T cells were 
labelled with IdU and CldU in the presence of 4 mM HU and in the 
presence or absence of 50 {1M mirin. Median IdU track lengths (n = 240) 
are indicated in red. ****P < 0.0001, Mann-Whitney test. b, Analysis as 
in a of IdU track lengths (mn = 180) in shSAM HEK293T cells expressing 
the T592A and T592E mutants of SAMHD1.¢, Cytosolic ssDNA (red) 

in shSAM HEK293T cells (n = 200) complemented with K312A, Y315A, 
T592A and T592E mutants and exposed for 2 h to 4 mM HU. Scale bars, 5 
yum. d, Mean fluorescence intensity of cytosolic ssDNA per cell (n = 200). 
Relative difference to shSAM: *P < 0.05, ****P < 0.0001, Mann-Whitney 
test. e, In vitro nuclease assays of MRE11 in the absence and presence 

of increasing concentrations of SAMHD1. Mean and s.d. are from three 
independent experiments. f, Representative gel as in e. 


immortalized B cells from a patient with AGS (SAMHD1 ~~) with a 
Q548X mutation (Extended Data Fig. 4d), which does not affect dNTP 
regulation'*". Together, these data indicate that phosphorylation of 
SAMHD1 on T592 by CDK is required for normal fork progression. 


SAMHDI promotes fork resection 
Because SAMHD1 was proposed to have potential 3’-5’ exonucle- 
ase activity!" similar to MRE11, we next asked whether it could be 
involved in fork resection. We labelled newly synthesized DNA in con- 
trol and shSAM cells with IdU for 15 min then exposed them to HU for 
120 min in the presence of CldU. In control cells, a significant degra- 
dation of IdU tracks was observed after HU addition (Extended Data 
Fig. 5a), indicating resection. This was prevented by SAMHD1 deple- 
tion and/or by addition of the MRE11 inhibitor mirin”°”! (Fig. 3a). 
Similar results were obtained by using a different labelling strategy 
(Extended Data Fig. 5b) and aphidicolin (a drug that inhibits DNA pol- 
ymerases without affecting dNTP pool) instead of HU to stall replica- 
tion forks (Extended Data Fig. 5c). Moreover, the addition of exogenous 
nucleosides did not restore fork resection in shSAM cells (Extended 
Data Fig. 5d) and dNTP pools were equally affected by HU exposure 
in control and shSAM cells (Extended Data Fig. 5e). Fork resection was 
also inhibited by the depletion of CtIP, a cofactor of MRE11 (Extended 
Data Fig. 5f) and by depleting SMARCAL]I (Extended Data Fig. 5g). 
This suggests that resection occurs partly on reversed forks, which form 
upon reannealing of nascent DNA strands”**8. 

To investigate whether SAMHD1 is also required to resect DNA ends 
during DSB repair, we used the single-molecule analysis of resection 
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tracks (SMART) method” and an in vivo DNA repair assay~” in shSAM 
and control cells. These approaches demonstrated that SAMHD1 is 
indeed required to resect DNA ends and repair DSBs (Extended Data 
Fig. 6a-e), confirming a recent report*!. Also, we showed that Xenopus 
Samhd1 is recruited to chromatin after EcoRI digestion (Extended Data 
Fig. 6f). Altogether, these data indicate that SAMHD1 acts with MRE11 
and CtIP to promote resection at forks and DSBs, independently of 
dNTP pools. 

Fork resection in shSAM cells was fully restored by the phosphomi- 
metic T592E mutant of SAMHD1, but not by the non-phosphorylatable 
T592A mutant (Fig. 3b) and it was restored by the dNTPase-defective 
mutant K312A, but not by the dNTPase-proficient mutant Y315A 
(Extended Data Fig. 5h), suggesting the resection function of SAMHD1 
requires phosphorylation but not the dNTPase activity. Also, fork resec- 
tion in immortalized B cells was impaired by the Q548X mutation 
found in a patient with AGS (Extended Data Fig. 5i). The resection-pro- 
ficient mutants, T592E and K312A, prevented the accumulation of 
cytosolic DNA whereas the resection-deficient mutants, T592A and 
Y315A, did not (Fig. 3c, d). Altogether, these data indicate that muta- 
tions in SAMHD1 affecting fork progression and nascent DNA resec- 
tion are also associated with the release of ssDNA fragments into the 
cytoplasm. 


SAMHD1 activates the MRE11 exonuclease 

To characterize further the function of SAMHD1 at the biochemical 
level, we assayed the binding of human SAMHD1 to various DNA sub- 
strates (Extended Data Fig. 7a, b) and found a high affinity for ssDNA 
and different fork structures. Remarkably, SAMHD1 stimulated three- 
fold the exonuclease activity of MRE11 (Fig. 3e, f), whereas it did not 
increase the activity of DNA2, FEN1 and bacterial ExolII (Extended 
Data Fig. 7c). We assayed direct binding of SAMHD1 to labelled 
MREI1 by using microscale thermophoresis and measured a dissoci- 
ation constant (Ka) of 977 + 176 nM (mean + s.e.m.), suggesting that 
stimulation of the exonuclease activity of MRE11 by SAMHD1 is due 
to a direct interaction (Extended Data Fig. 7d). In addition to binding 
to MRE11, we found that SAMHD1 directly interacts with replication 
protein A (RPA) with a Ky = 312 +57 nMas determined by microscale 
thermophoresis (Extended Data Fig. 7e) and by co-immunoprecipita- 
tion in Xenopus egg extracts (Extended Data Fig. 7f). Xenopus Samhd1 
also interacted with the resection protein CtIP (Extended Data Fig. 7g). 
Together, these data suggest that SAMHD1 binds with high affinity 
to RPA and fork structures, where it interacts directly with MRE11 to 
selectively stimulate its exonuclease activity. 


SAMHD1 activates CHK1 at stalled forks 
Because SAMHD1 binds to forks, MRE11 and RPA in vitro, and 
MRE11 is reported to have a role in CHK1 activation at stalled forks”, 
we asked whether the recruitment of MRE11 and the formation of 
RPA-coated ssDNA at stalled forks depends on SAMHD1. To address 
this possibility, we first examined the colocalization of MRE11 and 
EdU-labelled replication sites in HU-treated shSAM and control cells 
and found that their colocalization requires SAMHD1 (Extended Data 
Fig. 8a). To test whether the formation of RPA-coated ssDNA also 
depends on SAMHD1, we looked for ssDNA in HU-treated shSAM 
and control cells by detecting BrdU incorporation under non-dena- 
turating conditions. Both ssDNA formation (Extended Data Fig. 8b) 
and the formation of RPA foci (Extended Data Fig. 8c) depended on 
the presence of SAMHD1. Incidentally, we also observed a fourfold 
increase in BrdU-labelled cytosolic ssDNA in shSAM cells (Extended 
Data Fig. 8b, asterisks), consistent with the data shown in Fig. 1a, b. 
Because RPA recruits ATR to stalled forks, we next monitored the 
phosphorylation of the ATR targets CHK1 and H2AX in cells exposed 
either to HU or to CPT (Extended Data Fig. 8d, e). Phosphorylation 
of both proteins was much reduced in shSAM cells when compared 
to control cells, indicating that activation of the ATR-CHK1 pathway 
depends on SAMHD1. CHK1 phosphorylation was restored upon com- 
plementation of shSAM cells with wild-type SAMHD1 (Extended Data 


NATUR E|www.nature.com/nature 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


a + siCtrl + Mirin siRECQ1 + siBLM 
5 
= 
+ 
= 
a 
7) 
S 
5 um 
EZ 4_] 2 : shSAM 
= 80 : +HU 
o 
Oo 
5 20- 
a 
<x 
8 
Q 10 
; 5 SS cue 
3 = fa 
0 i T ——. : 
siRNAs: Ctrl WRN BLM RECQ1 DNA2 CtlP = Ctrl + mirin 
b Scr MRECQ1 ar Scr HSAM MRECQ1 BSAM+RECQ1 
SAM Hi SAM+RECQ1 
B 6 o 8 
= 51 a 
4 ra 
8 3 r= 4 
& 27 2 
6 14 ia 
1 0 0 


siRNA IFNA IFNG TNF MX1 ISG15 


Fig. 4 | Depletion of RECQ1 prevents the IFN response in SAMHD1- 
depleted cells. a, Cytosolic ssDNA in HU-treated shSAM HEK293T cells 
transfected for 48 h with short interfering RNAs (siRNAs) against WRN, 
BLM, RECQ1 (also known as RECQL), DNA2 and CtIP (also known as 
RBBP§8) or treated with mirin (n = 200; two experiments). b, Induction 
of IFNs (luciferase assay) in HL116 cells transfected with siRNAs against 
SAMHD1 (SAM), RECQI (RECQ1) or both (SAM + RECQ1), or with 

a control scrambled siRNA (Scr; n = 3). Data are representative of three 
experiments. c, Total mRNA was extracted from HL116 cells after siRNA 
transfection and the expression of IFNA, IFNG, TNF, MX1 and ISG15 was 
quantified by qRT-PCR (n = 3). Error bars denote s.d. of triplicates from 
representative experiments. 


Fig. 8f) or with the T592E and K312A mutants, but not with T592A and 
Y315A mutants, which are defective in fork resection (Extended Data 
Fig. 8g, h). Moreover, the activation of Chk] in Xenopus egg extracts 
treated with aphidicolin or DSB-inducing agents was also inhibited in 
the absence of Samhd1 or Mre11 activity (Extended Data Fig. 8i-k). 
We also found that SAMHD1 acts together with MRE11 for the restart 
of CPT-arrested forks (Extended Data Fig. 81). Together, these findings 
indicate a novel role for SAMHD1 in the activation of the ATR-CHK1 
pathway and in fork restart. 


RECQ1 induces IFNs in shSAM cells 

We hypothesized that the cytosolic ssDNA observed in shSAM cells 
might result from the displacement of nascent DNA by a helicase and 
cleavage by a flap endonuclease. To test this hypothesis, we depleted 
shSAM cells of BLM, WRN, RECQ1, DNA2 and CtIP (Extended 
Data Fig. 9a) and exposed them to HU. Depletion of RECQ1 and, to 
a lesser extent, DNA2 and CtIP resulted in a significant reduction in 
the cytosolic ssDNA seen in both HU-treated and untreated shSAM 
cells (Fig. 4a and Extended Data Fig. 9b-e). RECQ1 depletion also 
reduced the expression of IFN genes in HL116 (Fig. 4b, c), HEK293T 
and HeLa cells (Extended Data Fig. 9f, g) and the degradation of IdU 
tracks (Extended Data Fig. 9h), suggesting that RECQ1 contributes 
to fork resection and is responsible for the observed cytosolic ssDNA. 
Levels of cytosolic ssDNA were also reduced by mirin (Fig. 4a), sug- 
gesting that the endonuclease activity of MRE11 cleaves these displaced 
strands, releasing short ssDNA fragments, as it does at DSBs*?. 

To investigate further how cytosolic ssDNA is produced, shSAM 
cells were labelled with IdU before the addition of HU and with CldU 
during the HU arrest, and then released from HU in the presence of 
thymidine to chase CldU. Remarkably, both IdU- and CldU-labelled 
ssDNA were detected in the cytoplasm (Extended Data Fig. 10a, b), 
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indicating that DNA synthesized both before and after replication fork 
stalling contributes to the cytosolic ssDNA seen in HU-treated cells 
(Extended Data Fig. 10c). 

We conclude from our data that SAMHD1 functions during S phase 
at stalled DNA replication forks to activate the ATR-CHK1 pathway 
and to promote the resection of gapped or reversed forks by binding 
to and activating the MRE11 exonuclease. This function is indepen- 
dent of its previously described role in maintaining balanced dNTP 
pools, which is also important for accurate DNA replication'®”. This 
novel role of SAMHD1 depends on its phosphorylation by S-phase 
CDKs, which act as a switch to control its dNTPase-dependent and 
-independent functions. 

In the absence of SAMHD1, ssDNA molecules are released from 
stalled replication forks and accumulate in the cytoplasm, where they 
activate the cGAS-STING pathway, inducing expression of type I IFNs 
and other pro-inflammatory cytokine genes (Extended Data Fig. 10d). 
This is consistent with recent reports showing that DNA repair byprod- 
ucts can also induce the production of type I IFNs**°”. Mutations in 
SAMHD1 that prevent the degradation of nascent DNA at stalled forks 
increase cytosolic DNA, whereas mutations that affect the dNTPase 
activity of SAMHD1 do not, confirming thereby that defective fork 
processing and aberrant induction of the IFN pathway are directly 
linked. This mechanism is distinct from the induction of cGAS-STING 
by micronuclei recently reported in cells deficient for RNase H2, the 
enzyme encoded by the RNASEH2A, RNASEH2B and RNASEH2C 
genes, which are also frequently mutated in AGS*®". Because repli- 
cation stress contributes to cancer development'’, our findings also 
shed new light on the mechanism by which SAMHD1 mutations might 
promote tumorigenesis. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized, and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Cell lines and cell culture. HEK293T cells (ATCC CRL-3216) and SAMHD1- 
overexpressing HeLa cells!> were cultured at 37 °C under 5% CO in DMEM 
supplemented with 10% fetal calf serum (FCS). Epstein-Barr virus-immortalized 
B cells from a SAMHD1~"~ patient were cultured in RMPI-1640 supplemented 
with 10% FCS. Reporter HL116 cells were cultured in DMEM supplemented with 
10% FCS and ultra-glutamine under HAT selection”. 

Construction of SAMHD1 mutants. A SAMHD!1 wild-type clone (Invitrogen; 
clone ID: IOH55544, sequence NM 015474.3) with a C-terminal Flag and 
a haemagglutinin (HA) tag was modified to produce a shRNA-resistant coding 
sequence. The DNA was amplified by PCR with the appropriate primers to create 
a SAMHD!1 protein-coding sequence with an HA tag fused to the second amino 
acid residue from the N terminus and to restore the normal C terminus. The PCR 
reaction also created a Mlul site after the sequence encoding the protein C terminus 
and an Agel site followed by a Kozac sequence before the sequence encoding the 
protein N terminus. The amplified DNA was cloned in p-TripZ (Thermo Scientific 
Open Biosystems) between Agel and Mlul. All mutations were created by using 
the QuickChange IIXL kit from Agilent. The complete SAMHD1-coding sequence 
was checked by sequencing for each mutant. K312A: AAA to GCA; Y315A: TAT to 
GCT; T592A: ACA to GCA; T592E: ACA to GAA. HIV-1-derived lentiviral vectors 
were produced in HEK293T cells as previously described". 

Detection of cytosolic ssDNA. Cells were treated with 4 mM HU for 2 h, or 
untreated, as appropriate, and then incubated in fresh medium for 2 h and fixed 
for 20 min with 80% methanol at —20 °C. Coverslips were incubated with a mouse 
anti-ssDNA antibody (see Supplementary Information) overnight at 4 °C and with 
a secondary antibody conjugated to an Alexa Fluor dye for 1 h at room tempera- 
ture, followed by DAPI staining. Images were acquired by using a Zeiss ApoTome 
microscope and a LSM780 confocal microscope. For the detection of BrdU-labelled 
cytosolic DNA, cells were labelled for 2 h with 10 tM BrdU before HU exposure 
and BrdU was detected with a rat anti-BrdU antibody without denaturation. The 
mean fluorescence intensity of cytosolic ssDNA and BrdU was quantified with 
the CellProfiler software. Nuclei were segmented using DAPI signal and masked 
to quantify ssDNA and BrdU signals. Statistical analysis was performed with 
GraphPad Prism (GraphPad Software) using two-sided Mann-Whitney rank 
sum test. 

Quantification of cytosolic mitochondrial DNA. Cells growing on 100-mm 
coverslips were collected into 200 1] fractionation buffer (20 mM HEPES, pH 7.4, 
10 mM KCl, 2 mM MgCh, 1 mM EDTA, 1 mM EGTA, 1 mM DTT) and left on 
ice for 15 min. They were then lysed by passing cell suspensions 10 times through 
a 27-gauge needle attached to a 1-ml syringe and left on ice for a further 20 min. 
Cell lysates were centrifuged at 720g for 5 min at 4 °C. The supernatant was further 
centrifuged at 10,000g for 5 min at 4 °C to remove mitochondria. Cytosolic DNA 
was purified from the supernatant by using DNeasy Blood & Tissue kit (Qiagen) 
according to the manufacturer’s instructions. The presence in the cytosolic fraction 
of the mitochondrial gene MT-CO1 encoding the cytochrome c oxidase 1 (COX1) 
was quantified by qPCR using specific primers. 

Analysis of IFN production. Cell culture media were collected 48 h after transfec- 
tion with various siRNAs. Reporter HL116 cells harbouring a type-I IFN-inducible 
6-16 promoter fused to the firefly luciferase gene*’ were cultured for 48 h in the 
presence of the collected culture media. Luciferase activity was quantified accord- 
ing to the manufacturer’s instructions (Luciferase assay system, Promega). The 
expression of various IFNs and IFN-stimulated genes (ISGs) was quantified by 
qRT-PCR (see Supplementary Information) and normalized to GAPDH. Reverse 
transcription was performed by using the Superscript III First Synthesis System for 
RT-PCR (ref. 18080-051, Invitrogen). A LightCycler 480 SYBR Green I Master Mix 
(ref. 04887352001, Invitrogen) was used to perform quantitative PCR. 

Cell sorting. HEK293T cells expressing the Scramble-control shRNA (shScr) were 
labelled with propidium iodide. Populations of G;, S and G2/M phase cells were 
sorted in a FACSAria sorter (Becton Dickinson, MRI facility). 

Flow cytometry analysis of S phase progression. HEK293T cells transduced 
with lentiviral vectors expressing shScr or shSAMHD1 (shSAM) RNAs were pulse 
labelled with 10 1M EdU for 15 min and then chased with 100 1M thymidine for 
the indicated periods of time. After fixation with 1% formaldehyde for 30 min 
at room temperature, EdU incorporation was detected by using Click chemistry 
according to the manufacturer's instructions (Click-iT EdU Flow Cytometry Cell 
Proliferation Assay, Invitrogen). The cells were resuspended in PBS containing 
1% (w/v) BSA, 2 pg ml“! DAPI and 0.5 mg ml“! RNase A for 30 min at room 
temperature and were analysed in a MACSQuant flow cytometer (Miltenyi Biotec). 
The percentages of cells in G), S and G/M phases were quantified by using FlowJo 
single-cell analysis software (FlowJo, LLC). 


Confocal microscopy of DNA replication foci. HeLa cells overexpressing 
HA-tagged SAMHD1 and GFP proteins were grown on coverslips. They were 
labelled for 1 h with 10 1M EdU then fixed with 1% paraformaldehyde (PFA) 
and permeabilized with 0.1% saponin. Coverslips were then incubated overnight 
at 4 °C with an anti-HA antibody. Replication foci were detected by using Click 
chemistry, as for flow cytometry analysis. The cell nuclei were stained with DAPI 
and the coverslips were then mounted on glass slides and visualized by using a 
Zeiss LSM780 confocal microscope. The percentage of HA-EdU foci colocalization 
was quantified by ImageJ-JACoP (imagej.nih.gov/ij/plugins/track/jacop.html). 
Statistical analysis was performed with GraphPad Prism (GraphPad Software) 
using two-sided Mann-Whitney rank sum test. 

iPOND. HeLa S3 cells (1 x 10° per ml) or HEK293FT cells growing logarithmi- 
cally were incubated with 10 j.M EdU for 10 min. After EdU labelling, cells were 
fixed in 1% formaldehyde, quenched by adding glycine to a final concentration 
of 0.125 M and washed three times in PBS. Collected cell pellets were frozen at 
—80 °C and cells were permeabilized by resuspending 1.0-1.5 x 107 cells per ml 
in ice-cold 0.25% Triton X-100 in PBS and incubating for 30 min. Before the Click 
reaction, samples were washed once in PBS containing 0.5% BSA and once in PBS. 
Cells were incubated for 1 h at room temperature in Click reaction buffer contain- 
ing 10 tM azide-PEG(3+3)-S-S-biotin conjugate (Click ChemistryTools, cat. no 
AZ112-25), 10 mM sodium ascorbate, and 1.5 mM CuSO, in PBS. The ‘no Click’ 
reaction contained DMSO instead of biotin-azide. Following the Click reaction, 
cells were washed once in PBS containing 0.5% BSA and once in PBS. Cells were 
resuspended in lysis buffer (50 mM Tris-HCl pH 8.0, 1% SDS) containing protease 
inhibitor cocktail (Sigma) and sonicated with a Diagenode Bioruptor Plus for 40 
cycles (30 s on/30 s off). Samples were centrifuged at 14,500g at 4 °C for 30 min 
and the supernatant was diluted 1:3 with TNT buffer (50 mM Tris pH 7.5, 200 mM 
NaCl, 0.3% Triton X-100) containing protease inhibitors. An aliquot was taken as 
an input sample. Streptavidin—-agarose beads (Novagen) were washed three times 
in TNT buffer containing protease inhibitor cocktail. Two hundred microlitres 
of bead slurry were used per 1 x 10° cells. The streptavidin-agarose beads were 
resuspended 1:1 in TNT buffer containing protease inhibitors and added to the 
samples, which were then incubated at 4 °C for 16 h in the dark. Following bind- 
ing, the beads were washed twice with 1 ml TNT buffer, twice with TNT buffer 
containing 1 M NaCl and twice with TNT buffer. Protein-DNA complexes were 
eluted by incubating with 5 mM DTT in TNT buffer. Cross-links were reversed 
by incubating samples in SDS sample buffer at 95 °C for 20 min. Proteins were 
resolved on SDS-PAGE and detected by immunoblotting using specific antibodies. 
DNA fibre spreading. DNA fibre spreading was performed as described previ- 
ously’”. In brief, subconfluent cells were sequentially labelled first with 10 1M 
5-iodo-2’-deoxyuridine (IdU) and then with 100 |1M 5-chloro-2’-deoxyuridine 
(CldU) for the indicated times. One thousand cells were loaded onto a glass slide 
(StarFrost) and lysed with spreading buffer (200 mM Tris-HCl pH 7.5, 50 mM 
EDTA, 0.5% SDS) by gently stirring with a pipette tip. The slides were tilted slightly 
and the surface tension of the drops was disrupted with a pipette tip. The drops 
were allowed to run down the slides slowly, then air dried, fixed in 3:1 metha- 
nol:acetic acid for 10 min, and allowed to dry. Glass slides were processed for 
immunostaining with mouse anti-BrdU to detect IdU, rat anti-BrdU to detect 
CldU, mouse anti-ssDNA antibodies (see Supplementary Information for details) 
and corresponding secondary antibodies conjugated to various Alexa Fluor dyes. 
Nascent DNA fibres were visualized by using immunofluorescence microscopy 
(Leica DM6000 or Zeiss ApoTome). The acquired DNA fibre images were ana- 
lysed by using MetaMorph Microscopy Automation and Image Analysis Software 
(Molecular Devices). Statistical analysis was performed with GraphPad Prism 
(GraphPad Software) using two-sided Mann-Whitney rank sum test. 

SMART. Control (shScr) and SAMHD1-depeleted (shSAM) HEK293T cells 
were labelled with 10 |1M BrdU for 24 h. They were then treated with 5 1M ble- 
ocin (Calbiochem) for 1 h and collected at the indicated time points. They were 
processed for DNA fibre spreading as described. BrdU tracks were stained with 
anti-BrdU antibody without DNA denaturation and visualized by fluorescence 
microscopy (Zeiss ApoTome). The acquired DNA fibre images were analysed 
by using MetaMorph Microscopy Automation and Image Analysis Software 
(Molecular Devices) and statistical analysis was performed with GraphPad Prism 
(GraphPad Software) using two-sided Mann-Whitney rank sum test. 
Single-strand annealing assay. Seven hours after plating in six-well plates (1 x 10° 
cells per well), U2OS single-strand annealing cells were transfected with siRNAs 
against SAMHD1 or CfIP, or with control siRNAs (see Supplementary Information 
for sequences) by using interferin reagent (Polyplus, Ozyme). Forty-eight hours 
after siRNA transfection with siRNAs, HA-tagged I-Scel was expressed by tran- 
sient transfection (JetPEI, Polyplus, Ozyme) with 1.5 jg of the expression plasmid 
pCMV-I-Scel‘? for DSB induction. Expression was verified by immunoblotting 
with an anti-HA antibody. Forty-eight hours after transfection with the construct 
expressing I-Scel, cells were fixed in PBS containing 2% formaldehyde for 15 min 
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at room temperature. Cells expressing GFP were monitored by flow cytometry in 
a BD Accuri C6 flow cytometer. 

Collection and processing of HEK293T cells for nucleotide pool measurements. 
Three 150 x 20 mm cell culture dishes were seeded with (8 x 10°) HEK293T 
cells and grown to 80% confluency. After removal of the medium, the plates were 
washed twice with cold NaCl (9 g1~!). The cells were scraped with a cell scraper 
after addition of 250 1l of cold 15% TCA containing 30 mM MgCl, to each of the 
three plates and they were pooled together. The pooled cells were collected in an 
Eppendorf tube and snap-frozen in liquid nitrogen. They were stored at —80 °C 
until processed. For extraction of dNTPs and NTPs, cells were thawed on ice and 
vortex-mixed for 10 min in a cold room. The supernatant was then collected by 
centrifugation at 14,000g for 5 min at 4 °C and processed as described“. 
Detection of ssDNA, RPA and MRE11 foci. Cells growing on coverslips were 
labelled with 10 |1M BrdU for 24 h. They were treated with 1 mM HU for 2 h after 
the removal of BrdU and fixed with ice-cold methanol for 1 h at —20 °C. The 
coverslips were incubated with an anti-BrdU monoclonal antibody (without DNA 
denaturation) overnight at 4 °C and then with a secondary antibody conjugated to 
an Alexa Fluor dye for 1 h at room temperature, followed by DAPI staining. Images 
were acquired with a wide-field Leica DM6000 microscope and a Zeiss LSM780 
confocal microscope. The number of subnuclear ssDNA foci was quantified by 
using CellProfiler image analysis software. For detection of chromatin-bound RPA 
foci during S phase, cells growing on coverslips were pulse labelled with 10 1M EdU 
for 10 min and treated with 4 mM HU for 2 h. They were fixed with 4% PFA in PBS 
for 15 min and then incubated for 3 min at 4 °C with CSK buffer (10 mM PIPES 
pH 6.8, 100 mM NaCl, 1 mM MgCl, 1 mM EGTA, 300 mM sucrose, 0.5 mM 
DTT) containing 0.25% Triton X-100 and phosphatase inhibitor cocktail (Sigma- 
Aldrich, P0044). EdU incorporation was detected by using Click chemistry and 
then blocked with 3% BSA in PBS for 1 h at room temperature. The coverslips were 
incubated with an anti-RPA antibody (overnight at 4 °C) and then with a second- 
ary antibody conjugated to an Alexa Fluor dye for 1 h at 37 °C, followed by DAPI 
staining. Images were acquired by using a Zeiss ApoTome microscope. The mean 
fluorescence intensity in EdU-positive cells was quantified by using CellProfiler 
(http://www.cellprofiler.org). For the detection of MRE11 foci, cells seeded on 
coverslips were labelled with EdU and treated with or without HU as described 
above. They were incubated in cold extraction buffer (20 mM HEPES pH 7.5, 50 
mM NaCl, 300 mM sucrose, 3 mM MgCh, 0.5% Triton X-100) for 5 min at 4 °C and 
then fixed in fixation buffer (3.7% PFA, 2% sucrose, 0.5% Triton X-100) for 20 min 
at room temperature. The coverslips were incubated with an anti- MRE11 antibody 
overnight at 4 °C after blocking in PBS containing 1% BSA in PBS for 1 h at room 
temperature. After incubation with a secondary antibody conjugated to Alexa Fluor 
dye, EdU incorporation was detected by using Click chemistry, followed by DAPI 
staining. Images were acquired by using a Zeiss LSM780 confocal microscope. The 
percentage of colocalized MRE11 and EdU foci was quantified by ImageJ-JACoP. 
Cloning and purification of recombinant Xenopus Samhd1. The reference 
sequence of Xenopus Samhd1 (xSamhd1), available from Uniprot, Q6INN8 (http:// 
www.uniprot.org), contains 632 amino acids. In brief, the cDNA encoding Xenopus 
laevis samhd1 (XGC Samhd1, clone ID 3402629, from Thermo Fisher Scientific) 
was cloned into the pET43 plasmid following standard methods. Clones were 
validated by PCR and DNA sequencing. 

Purification of recombinant xSamhd1 (His- or His-Flag-tagged) was performed 
in two steps after IPTG (0.5 mM) induction of the bacteria overnight at 23 °C. The 
bacterial pellet was lysed in a buffer containing 500 mM NaCl, 50 mM HEPES pH 
7.5, 2mM B-mercaptoethanol, 10% glycerol and protease inhibitors cocktail, then 
sonicated and clarified by centrifugation. The first purification step was performed 
by binding the recombinant xSamhd1to Ni-NTA resin (HisPur, Thermo Fisher 
Scientific) for 2 h at 4 °C and eluting with 100 mM and 250 mM imidazole. After 
dialysis, the recombinant protein was further purified by size exclusion chromatog- 
raphy (Superdex 200 10/300 GL, GE healthcare). For use in Xenopus egg extracts, 
both His- and His-Flag— xSamhd1 were purified in buffer containing 150 mM 
NaCl, 25 mM Tris-HCl pH 7.6, 1 mM DTT, 10% glycerol. 

Production of anti-xSamhd1 antibody. For antibody production, the recom- 
binant His-xSamhd1 was purified, as described above, but size exclusion chro- 
matography was performed in phosphate-buffered saline. This preparation was 
supplied to BioGenes GmbH, who immunised rabbits and prepared the antiserum. 
The antibody was affinity purified from the antiserum by binding to recombinant 
His-xSamhd1 by the Cogentech service of the IFOM protein facility. 
Preparation of interphase Xenopus egg extracts and treatments. Xenopus egg 
extracts were prepared as described**. The DNA damage response and DNA rep- 
lication were analysed as previously described**®. To induce DNA double-strand 
breaks, we added 0.05 U tl"! EcoRI (New England Biolabs) to the extracts. To 
slow or stall replication forks, we added aphidicolin (Sigma) to the extracts at the 
concentrations indicated in the main text. 

Immunodepletion of xSamhd1 in Xenopus egg extracts. To immunodeplete 
xSamhd1 from Xenopus egg extracts, 0.30.5 ml of egg extract were incubated with 
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affinity-purified IgGs (30 mg) and with 120 il of Dynabeads-ProteinA (Thermo 
Fisher, 10002D). The immunodepletion was performed in three rounds of 30-45 
min at 4°C. 

Immunoprecipitation of xSamhd1 in Xenopus egg extract. For coimmuno- 
precipitation of xSamhd1 with RPA70O, 200 jl of Dynabeads—Protein A slurry 
were coupled with 60 j1g of anti-xSamhd1 purified serum or 50 \1g of rabbit pre- 
immune serum (IgG control). The beads were then incubated with 500 iil of clar- 
ified Xenopus egg extract brought to a final volume of 1.5 ml with NIB buffer 
(50 mM KCl, 50 mM HEPES pH 7.0, 5 mM MgCl, 2 mM DTT, 0.5 mM sper- 
midine, 0.15 mM spermine, 20% sucrose, 0.05% Triton X-100). After 1 h at 4 °C, 
a ‘flow-through’ fraction of the unbound material was recovered and boiled in 
Laemmli sample buffer (Biorad) for further analysis. Beads were washed five times 
in NIB buffer and boiled in Laemmli sample buffer before analysis by western 
blotting. For coimmunoprecipitation of xSamhd1 with xCtIP, 100 jl Dynabeads- 
Protein A slurry was coupled to 30 j.g anti- xSamhd1 (affinity purified serum) or 
50 jg of pre-immune serum (IgG control). The beads were then incubated with 
300 il of clarified Xenopus egg extract brought to a final volume of 1 ml with a 
buffer containing 100 mM KCl, 50 mM HEPES pH 7.0, 5 mM MgCl), 2 mM DTT, 
20% sucrose, 0.05% Triton X-100. After overnight incubation at 4 °C, a fraction of 
the flow-through was recovered and boiled in Laemmli sample buffer for further 
analysis. Beads were washed five times in the immunoprecipitation buffer and 
boiled in Laemmli sample buffer before analysis by western blotting. 
Biotin-dUTP pulldown of nascent chromatin. Biotin-dUTP pulldown was per- 
formed as previously described”*. In brief, Xenopus sperm nuclei were added 
to 100 jl Xenopus egg extracts at a final concentration of 4,000 nuclei per jl. 
Forty-five minutes after sperm nuclei addition, newly synthesized DNA in the 
extracts was labelled with 40 j1M biotin-16-dUTP (Roche) in the presence of 
either 5 1M aphidicolin or DMSO as control for 10 min. DNA replication was 
stopped by the addition of 200 1l cold EB-EDTA buffer (50 mM HEPES-KOH 
pH 7.5, 100 mM KCl, 2.5 mM MgCl, 1 mM EDTA). Samples were homogenized 
by using a cut Eppendorf p1000 pipette tip and overlaid on 600 \1l EB-EDTA 
buffer containing 30% (w/v) sucrose. Nuclei were collected by centrifugation at 
8,300g at 4 °C, the nuclear pellets were resuspended in 400 jl EB-NP40 buffer 
(50 mM HEPES-KOH pH 7.5, 100 mM KCl, 2.5 mM MgCl, 0.25% NP40) and 
then subjected twice to 10 min sonication with a Bioruptor device set to maxi- 
mum power. After the sonication step, 20 ul from each sample were set aside 
(5% input). Biotinylated DNA fragments were then pulled-down by incubation 
with 40 tl Dynabeads M-280 Streptavidin for 1 h at 4 °C. The beads were then 
washed three times in EB-EDTA buffer and boiled with 30 tl of 1x denaturing 
loading buffer, and the entire volume was separated by SDS-PAGE and analysed 
by western blotting. 

Protein purification. 10x His-SAMHD1 was overexpressed from pET19b as pre- 
viously described‘. All purification steps were carried out at 4 °C. The cell pellet 
was resuspended and lysed in lysis buffer (50 mM Tris-HCl pH 7.5, 300 mM NaCl, 
1 mM B-mercaptoethanol, 1 mM EDTA, protease inhibitors) and centrifuged for 
1 hat 35,000g. Soluble extract was incubated with pre-equilibrated Ni-NTA agarose 
resin (Qiagen) for 2 h with continuous mixing. The resin was washed with buffer 
T (20 mM Tris-HCl pH 7.5, 10% glycerol, 1 mM EDTA) and bound protein was 
eluted with a gradient of 100-500 mM imidazole in buffer T. The eluted fractions 
were diluted in buffer T; loaded onto a 1 ml SourceS column and the protein was 
eluted with a 10-ml gradient of 100-700 mM KCI in buffer T. The peak fractions 
were stored in small aliquots at —80 °C. 

MRE11-6 x His and yMRE11 were purified from Sf9 insect cells or yeast 
cells, respectively, as described previously**. DNA2 was purified from Sf9 cells as 
described previously”? and FEN1 was expressed and purified by the same protocol 
as described for yFEN1*°. Bacterial ExoIII was purchased from Thermo Fisher 
Scientific (EN0191). 

Nuclease assays. Fluorescently labelled DNA substrates used for nuclease assays 
were prepared by annealing as described elsewhere*!. The concentrations of 
SAMHD1 indicated in the main text were pre-incubated for 5 min at 37 °C with 
MRE11 (4nM) or yMRE11 (4 nM) followed by addition of a DNA substrate 
and incubation in buffer P (40 mM KCl, 10 mM Tris-HCl (pH 7.5), 1 mM DTT, 
10 pg ml! BSA) in the presence of 2mM dGTP, 2mM MnCl, and 2mM MgCl, 
for 50 min at 37 °C. The reaction mixtures were then incubated with 0.1% SDS and 
500 pg ml"! proteinase K at 37 °C for 20 min, heat-denatured, the DNA was 
resolved on 20% denaturing PAGE gels (acrylamide:bisacrylamide, 19:1) and 
scanned with a Fuji FLA 9000 imager. Where indicated, gels were quantified 
using Multi Gauge v.3.2 software (Fuji). For nuclease assays with DNA2, reactions 
were carried out as for MRE11 and yMRE11 but using 0.4 nM DNA2 in buffer 
P containing 2 mM ATP, 2 mM MgCl, and 2 mM dGTP. Assays of FEN1 (1 nM) 
and ExollI (20 ,U jl~!) were carried out in buffer P containing 2mM MgCl, 
and 2 mM dGTP. 

Electrophoretic mobility shift assays. Fluorescently labelled DNA substrates were 
prepared as for the nuclease assays*!. SAMHD1 was incubated with fluorescently 
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labelled DNA substrate for 10 min at 37 °C in the presence of 30 mM KCl, 10 mM 
Tris-HCl pH 7.5, 1 mM DTT, 2mM MgCh. Protein-DNA complexes were cross- 
linked by incubation with 0.01% glutaraldehyde for 5 min at 37 °C and resolved 
on 0.8% agarose gels. The gels were scanned with a Fuji FLA 9000 imager and 
quantified using Multi Gauge v.3.2 software (Fuji). 

Microscale thermophoresis. For MRE11-SAMHD1 interaction measurements, 
MRE11 was fluorescently labelled by using Monolith Protein Labelling Kit RED- 
MALEIMIDE (Cysteine Reactive) according to the manufacturer’s protocol. 
Measurements were performed with 8 nM labelled MRE11 and the indicated 
concentration of SAMHD1 in PBS containing 0.05% Tween. Thermophoresis 
was performed on a Monolith NT.115 (NanoTemper Technologies GmbH) set 
at 50% LED and 80% MST power at 25 °C and with 5 s and 30 s laser off and 
on times, respectively. For RPA-SAMHD1 interaction measurements, 10 x His- 
SAMHD1 was labelled by using the Monolith His-Tag Labelling Kit RED-tris-NTA 
according to the manufacturer’s protocol. The thermophoresis measurements were 
performed with 50 nM SAMHD1 and increasing concentration of RPA in PBS 
containing 0.05% Tween and 2 mM MgCl, a Monolith NT.115 set at 40% LED and 
80% MST power at 25 °C and with 5 s and 30 s laser off and on times, respectively. 
Data were analysed by using MO.Affinity Analysis Software v.2.2.4 (NanoTemper 
Technologies GmbH). 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. The authors declare that the data supporting the findings of this 
study are available within the Article and its Supplementary Information files or 
are available from corresponding authors. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | SAMHD1-depleted cells secrete IFNs. a, Western 
blot analysis of SAMHD1 in HEK293T cells expressing SAMHD1 shRNA 
(shSAM) or a scrambled control (shScr). b, Cytosolic ssDNA (red) in 
shScr and shSAM HEK293T cells and in HEK293T cells transfected with 
TREX1 siRNA (siTREX1). Scale bar, 5 um. n = 5. c, HL116 cells containing 
an IFN-stimulated response element-luciferase reporter gene were 
incubated with culture medium from shScr or shSAM HEK293T cells for 
48 h. Mean luciferase activity and s.d. from four independent experiments 
are shown. d, HeLa cells were transfected for 48 h with control siRNA 
(siCtrl) or SAMHD1 siRNA (siSAM). Expression of IFNA, IFNG and 

the IFN-stimulated genes MX1 and ISG15 was quantified by qRT-PCR. 
Data are representative of three independent experiments. Error bars 
denote s.d. for a representative triplicate experiment. e, shScr or shSAM 
HEK293T cells were treated with 4 mM HU for the indicated times and 
then transferred to fresh medium for a total of 20 h. Culture medium 

was collected and incubated with HL116 cells for 48 h before luciferase 
assay. Data shown are representative of three independent experiments. 

f, Quantification of the mean fluorescence intensity of cytosolic BrdU in 
the experiment shown in Fig. le. Quantification was performed on 250 
cells by using CellProfiler. Median BrdU intensity is indicated in red. 
#8 P< 0.0001, Mann-Whitney rank sum test. g, shScr and shSAM 
HEK293T cells were treated for 2 h with 4 mM HU or for 6 h with 40 1M 
oligomycin, used here to damage mitochondria. Cells were labelled with 
the mitochondria-selective dye MitoTracker (Invitrogen). The integrity of 
mitochondria was assessed by confocal microscopy. Representative images 
are shown. Scale bar, 5 jum. h, The abundance of mitochondrial COX1 


DNA in cytosolic DNA isolated from cells treated as in g was quantified by 
qPCR and normalized to GAPDH. Mean and s.d. from three independent 
experiments are shown. i, Levels of STING mRNA in cGAS-knockout, 
STING-knockout and control HeLa cells transfected with SAMHD1 siRNA 
were measured by qRT-PCR 48 h after transfection. Mean and s.d. from 
three independent experiments are shown. j, Levels of cGAS protein in 
cGAS-knockout, STING-knockout and control HeLa cells transfected 
with SAMHD1 siRNA were monitored by western blotting 48 h after 
transfection (n = 3). k, Levels of IRF3 protein in SAMHD1-depleted 

HeLa cells co-transfected with siRNAs against STING or IFR3 (n = 3). 

1, HeLa cells were co-transfected for 48 h with siRNAs against SAMHD1 
and STING or IRF3 . Levels of ISG15 and MX1 mRNA were analysed by 
qRT-PCR. Mean and s.d. from three independent experiments are shown. 
m, Expression levels of STING mRNA in HeLa cells co-transfected with 
siRNAs against SAMHD1 and STING or IFR3. Mean and s.d. for three 
independent experiments are shown. n, HeLa cells were co-transfected for 
48 h with siRNAs against SAMHD1 and STING or IRF3. They were then 
treated with 4 mM HU for 8 or 18 h, washed with PBS and further cultured 
in fresh medium for 18 h. Expression of ISG15 mRNA was quantified by 
qRT-PCR. Data are representative of three independent experiments. 
Mean and s.d. correspond to triplicates of a representative experiment. 

0, HEK293 cells (STING siRNA) and CRISPR-Cas9-mediated STING- 
knockout THP-1 cells were transfected with SAMHD1 siRNA. Expression 
of MX1 mRNA was quantified by (RT-PCR. Mean and s.d. from three 
independent experiments are shown. 
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Extended Data Fig. 2 | See next page for caption. 
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Extended Data Fig. 2 | SAMHD1 localizes to replication foci and 
binds nascent DNA. a, EdU (red) and SAMHD1 (green) foci in HeLa 
cells expressing HA-tagged SAMHD1 (SAMHD1-HA) or GFP-HA. 
Immunofluorescence microscopy was performed as indicated in Fig. 2a. 
Scale bar, 5 jum. n = 2. b, HeLa cells were labelled with 10 uM EdU for 
10 min then processed for iPOND analysis. Proteins associated with 
nascent DNA were analysed by mass spectrometry. The number of 
peptides from SAMHD1 and other factors found associated with nascent 
DNA is indicated. The table summarizes the data from two independent 
experiments. c, Measurement of intracellular dNTP pools in shScr and 
shSAM HEK293T cells (two independent experiments). d, Production 
of recombinant His—xSamhd1 (see Supplementary Information) and 
characterization of the antibody raised against this protein. e, xSamhd1 


associates with nascent DNA. Xenopus sperm DNA was incubated in 
Xenopus egg extract for 45 min then nascent DNA was labelled for 10 
min with 40 ,.M biotin-16-dUTP in the absence or presence of 5 11M 
aphidicolin (Aph). Nascent chromatin was isolated on streptavidin 
beads and analysed by western blotting for the proteins indicated 

(see Methods). A representative experiment is shown (n = 2). f, Samhd1 
binds chromatin in response to DSBs and aphidicolin. Left, Xenopus 
egg extracts were treated with the indicated doses of aphidicolin (\1M), 
topotecan (T, 100 1M) or EcoRI (U yl~!). Chromatin-bound proteins 
were then analysed by western blotting. Right, histograms show relative 
signal intensity of Samhd1 on chromatin. Data are mean and s.d. from four 
independent experiments. 
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Extended Data Fig. 3 | Fork progression is altered in the absence of 
SAMHD1. a, Western blot analysis of SAMHD1 protein in SAMHD1- 
depleted HeLa cells (using shSAM) and in SAMHD1-depleted HEK293T 
cells expressing an shRNA-resistant, full-length SAMHD1 (SAM) under 
the control of a doxycycline-inducible promoter. Expression of SAMHD1 
was analysed 72 h after induction with doxycycline. b, Control (shScr) 
and SAMHD1-depeleted (shSAM) HEK293T cells were seeded in 
24-well plates. Cell number was determined by using trypan blue exclusion 
and haemocytometry. Mean and s.d. are shown from three independent 
experiments. c, SAMHD1 is required for normal S-phase progression. 
shScr and shSAM HEK293T cells were pulse-labelled with EdU for 20 
min and chased with thymidine for 4 h before flow cytometry analysis. 
The arrowhead indicates the EdU-labelled cell population that completed 
DNA replication and came back to G; phase. The percentage of cells in 
G)/M phase is indicated. n = 2. d, shScr and shSAM HeLa cells were 
labelled sequentially for 20 min with IdU and CldU and the length of 
CldU tracks (n = 150) was analysed by DNA fibre spreading. Median track 
lengths are indicated in red. e, Representative images of stretched DNA 


fibres. Red, IdU; green, CldU; blue, DNA. The green channel is shown 
separately for clarity. Scale bar, 4 jum. f, DNA fibres from shScr and shSAM 
HEK293T cells sequentially labelled for 20 min with IdU and CldU were 
either stretched on glass slides (DNA fibre spreading; n = 190) or combed 
on silanised coverslips (DNA combing; n = 130) and then analysed by 
immunofluorescence microscopy. The length distribution of CldU tracks 
is shown. Median track lengths are indicated in red. g, DNA fibres from 
cells treated as in f were stretched by DNA combing and the distance 
between CldU tracks, which is indicative of the density of active origins, 
was determined for five independent experiment. Median distances are 
indicated. Whiskers correspond to 10-90 percentiles. ****P < 0.0001, 
Mann-Whitney rank sum test. h, SAMHD1-depleted cells show increased 
spontaneous fork arrest. The ratio of the shortest to the longest CldU track 
from cells treated as in d was calculated for each pair of divergent sister 
replication forks and the percentage of sister forks showing a ratio of less 
than 0.6 is shown (n = 75). Error bars indicate s.d. from three independent 
experiments. P < 0.05, Mann-Whitney rank sum test. 
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Extended Data Fig. 4 | SAMHD1 promotes fork progression Methods. Median and s.d. are shown for five independent experiments. 
independently of dNTP pools. a, Map of SAMHD! protein domains c, Wild-type SAMHD1 or the K312A and Y315A mutant forms were 


indicating the positions of the mutations analysed in this study. The level expressed in SAMHD1-depleted (shSAM) HEK293T cells and the cells 
of phospho-SAMHD1 (T592) in HEK293T cells collected by FACS in Gy, were labelled sequentially with IdU and CldU for 15 min each. The lengths 


S and G,/M phases was determined with a phospho-specific antibody. of the CldU tracks (n = 180) were measured on spread DNA fibres. 
Levels of wild-type and mutant SAMHD1 were also analysed by western d, Immortalized B cells from an SAMHD1~‘~ patient with a homozygous 
blotting after induction of the genes with doxycycline (Dox) for 72 h. Q548X mutation or a healthy donor (WT) were labelled with IdU and 
n= 4.b, SAMHD1-depleted HEK293T cells were complemented with CldU and the lengths of CldU tracks (n = 240) were measured as inc. 


the phosphomimetic (T592E) or non-phosphorylatable (T592A) mutants 6 PD < 0.0001, Mann-Whitney rank sum test. 
of SAMHD1. Intracellular dNTP pools were quantified as described in 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | Role of SAMHD1 in the degradation of nascent 
DNA at HU-arrested forks. a, Control (shScr) and shSAM HEK293T cells 
were labelled with IdU for 15 min and then exposed to 4 mM HU for 30, 
60 or 120 min in the presence of CldU. The lengths of the IdU tracks 

(n = 230) were measured on spread DNA fibres. b, shScr and shSAM 
HEK293T cells were sequentially labelled for 15 min with IdU and for 15 
min with CldU. Then, they were either collected immediately or treated for 
2h with 4 mM HU before DNA fibre analysis. The lengths of the IdU and 
CldU tracks (n = 160) were plotted as the ratio of CldU to IdU. c, shScr 
and shSAM HEK293T cells were treated as in b, except that HU was 
replaced with 1 tM aphidicolin (n = 160). d, shScr and shSAM HEK293T 
cells were incubated for 120 min with a balanced mix of nucleosides 
(+-dN) and then labelled with IdU and CldU in the presence of 4 mM HU, 
as indicated. The lengths of IdU tracks (n = 160) were measured on spread 
DNA fibres. e, shScr and shSAM HEK293T cells were treated for 2 h with 
4 mM HU, or not treated, and intracellular dNTP pools were measured 
and expressed relative to intracellular rNTP pools in two independent 
experiments. f, shScr and shSAM HEK293T cells were transfected for 

48 h with an siRNA against CtIP (siCtIP) or a control (siCtrl) and then 


labelled with IdU and CldU in the presence of 4 mM HU, as indicated. 
The lengths of the CldU and IdU tracks (n = 160) were measured on 
spread DNA fibres and plotted as the ratio of CldU to IdU track lengths. 
g, shScr and shSAM HEK293T cells were transfected with siRNA against 
SMARCALI or with a siCtrl for 48 h and then labelled with IdU and CldU 
in the presence of 4 mM HU, as indicated. The lengths of the CldU and 
IdU tracks (n = 200) were measured on spread DNA fibres and plotted as 
the ratio of CldU to IdU. h, Wild-type SAMHD1, the dNTPase-deficient 
mutant K312A or dNTPase-proficient mutant Y315A were expressed in 
shSAM HEK293T cells and the cells were labelled with IdU for 15 min 
and then with CldU in the presence of 4 mM HU for 2 h, as indicated. The 
lengths of the IdU tracks (n = 140) were measured on spread DNA fibres. 
i, Immortalized B cells from a SAMHD1~‘~ patient with a homozygous 
Q548X mutation or a healthy donor (WT) were labelled with IdU and CldU 
in the presence of HU, as in h. The lengths of the IdU tracks (n = 200) were 
measured on spread DNA fibres. In a-i, median track lengths are indicated 
in red. *P < 0.05, ****P < 0.0001, Mann-Whitney rank sum test. 
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Extended Data Fig. 6 | SAMHD1 promotes the resection of DNA DSB 
ends. a, Analysis of DNA end resection at the level of individual DNA 
fibres with the SMART assay. Control (shScr) and SAMHD1-depleted 
(shSAM) HEK293T cells were grown for 24 h in the presence of BrdU to 
label genomic DNA and DSBs were induced with 5 jg ml“! bleocin for 
1h. Cells were then washed and collected at the indicated times. DNA 
fibres were spread on glass slides and BrdU was detected without DNA 
denaturation. Representative images (three independent experiments) 
of BrdU tracks (red) 1 h after bleocin removal are shown. Scale bar, 

5 wm. b, Quantification of BrdU track lengths (n = 200) in shScr and 
shSAM HEK293T cells treated with bleocin as in a. Median track lengths 
are indicated in red. ****P < 0.0001, Mann-Whitney rank sum test. 

c, Schematic of the U2OS single-strand annealing cell assay for DNA 
DSB repair. These cells carry a reporter vector in which an I-Scel site has 


been incorporated into a GFP gene, such that single-strand annealing- 
mediated repair events result in GFP fluorescence. d, U2OS single-strand 
annealing cells were transfected with siRNAs against SAMHD1, CtIP, or 
both, or with a control scrambled siRNA (Scr). They were then transfected 
with a plasmid expressing HA-tagged I-Scel under the control of aCMV 
promoter. Percentages of GFP-positive cells were quantified by flow 
cytometry and were normalized to the control cells. Error bars denote 

s.d. of three independent experiments. e, Expression of SAMHD1, CtIP 
and HA-tagged I-Scel in the experiment shown in d were monitored by 
western blotting (n = 3). f, Xenopus sperm DNA was incubated in Xenopus 
egg extract in the presence of 0.05 U jl! of EcoRI for the indicated times 
then the chromatin was purified and analysed by western blotting for the 
indicated proteins. A representative experiment is shown (n = 3). 
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Extended Data Fig. 7 | SAMHD1 binds MRE11 and stimulates its and RPA as monitored by microscale thermophoresis assay. Error bars 
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(ssDNA, dsDNA, dsDNA with a 5’ overhang, forked DNA and reversed of Samhd1 and RPA from Xenopus egg extracts. Western blots are 
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(EMSA) as described in the Supplementary Information. Representative A representative experiment is shown (n = 2). g, Co-immunoprecipitation 
gel shift images are shown (n = 3). b, Quantification of the EMSA in a. of Samhd1 and CtIP from Xenopus egg extracts, as in f and described in 
c, SAMHD1 stimulates the nuclease activity of yeast MRE11, but not the Supplementary Information. A representative experiment is shown 
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s.d. of three independent experiments. d, e, SAMHD1 binds MRE11 
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Extended Data Fig. 8 | SAMHD1 is required to recruit MRE11 and 
RPA to stalled forks and to activate the replication checkpoint. a, Top, 
control (shScr) and SAMHD1-depleted (shSAM) HEK293T cells were 
labelled with EdU for 10 min then grown without EdU for a further 2 h in 
the presence of 4 mM HU. Chromatin-bound MRE11 foci were detected by 
confocal microscopy and compared to EdU foci. Representative confocal 
images are shown. Scale bar, 5 xm. Bottom, the co-localization of EdU and 
MRE11 foci was quantified by using the JACoP plugin of Image] (n = 55). 
Whiskers indicate the 10th and 90th percentiles. ****P < 0.0001, Mann- 
Whitney rank sum test. b, Top, foci (arrowheads) of ssDNA (red) in the 
nuclei (blue) of HU-treated shScr and shSAM HEK293T cells labelled 
for 24 h with 10 .M BrdU. Representative images from two independent 
experiments are shown. Asterisks indicate BrdU-labelled cytosolic ssDNA. 
Bottom, quantification of nuclear and cytosolic BrdU foci per cell as in a. 
c, Top, shScr and shSAM HEK293T cells were incubated with 10 1M EdU 
for 10 min and then treated for 2 h with 4 mM HU. RPA foci (green) were 
detected by using an anti-RPA1 antibody after CSK extraction. Bottom, 
mean fluorescence intensity of chromatin-bound RPA1 was quantified 
from 70 EdU-positive cells by using CellProfiler. ****P < 0.0001, Mann- 
Whitney rank sum test. Scale bar, 5 pm. d, Immunoblots of CHK1 
phosphorylated on $345 (p-CHK1), \-H2AX phosphorylated on $139, 
and CHK2 phosphorylated on T68 (p-CHK2) in shScr and shSAM 


HEK293T cells treated for 60 min with 0.25 mM HU or 1 1M CPT (n = 3). 


e, Expression of CHK1 and CHK2 proteins after HU or CPT treatment as 
indicated in d. f, shScr and shSAM HEK293T cells, and shSAM HEK293T 
cells expressing full-length SAMHD1 (shSAM + SAMHD1) were tested 


for their ability to phosphorylate CHK1 on $345 upon exposure to 0.25 

or 1 mM HU for 2h. The fold of induction was normalized to untreated 
cells by quantifying the bands on the blots and calculating the ratios of 
p-CHK1 to SAMHD1 (n = 3). g, shSAM HEK293T cells expressing the 
non-phosphorylatable mutant T592A or the phosphomimetic mutant 
T592E of SAMHD1 were treated with 0.25 mM or 1 mM HU for 2h. The 
amounts of p-CHK1 and SAMHD!1 were analysed by immunoblotting 
and quantified as described in f; n = 2. h, shSAM HEK293T cells 
expressing the exonuclease positive (K312A) or negative (Y315A) mutants 
of SAMHD1 were treated with the indicated doses of HU for 2 h. The 
amounts of p-CHK1 and SAMHD!1 were analysed by immunoblotting and 
quantified as described in f; n = 2. i, Depletion of Samhd1 from Xenopus 
egg extracts impairs the phosphorylation of Chk1 in response to EcoRI 
(0.05 U yl), aphidicolin (20 1M) or etoposide (Eto, 30 1M; n = 3). 

j, Samhd1 and Mre11 are required to activate Chk1 in Samhd1-depleted 
or mock-depleted Xenopus egg extracts treated with 0.05 U wl! of EcoRI. 
Mre11 was inhibited with 100 11M mirin; this experiment was performed 
once. k, CHK1 activation in Samhd1-depleted Xenopus egg extracts upon 
formation of DSBs by addition of 0.05 U il“! EcoRI can be restored by 
the addition of approximately 25 nM of recombinant His-xSamhd1; a 
representative experiment is shown (n = 2). 1, DNA fibre analysis of fork 
restart in shScr and shSAM HEK293T cells treated for 18 h with 10 1M 
mirin and for 2 h with 1 1M CPT. Fork restart (that is, formation of red 
and green tracks) was monitored 30 min after CPT removal (n = 70; three 
independent experiments). ****P < 0.0001, Mann-Whitney test. 
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Extended Data Fig. 9 | Depletion of RECQ1 prevents accumulation 

of cytosolic ssDNA in SAMHD1-depleted cells. a, RECQ1, CtIP, 

BLM, DNA2 and WRN proteins remaining 48 h after transfection with 
the corresponding siRNAs (n = 2). b, SAMHD1-depleted (shSAM) 
HEK293T cells were transfected with the indicated siRNAs for 48 h and 
then treated with 4 mM HU for 2 h. Cytosolic ssDNA was visualized by 
immunofluorescence microscopy. Representative confocal microscopy 
images are shown (n = 2). Scale bar, 5 jum. c, Depletion of RecQ1 
prevents the accumulation of cytosolic ssDNA in untreated shSAM cells. 
Cytosolic ssDNA was detected by immunofluorescence microscopy. 
Representative images are shown (n = 3). Scale bar, 5 pm. d, Mean 
fluorescence intensity of cytosolic ssDNA in the cells c was quantified by 
using CellProfiler (n = 130). ****P < 0.0001, Mann-Whitney rank sum 
test. e, RECQ1 and SAMHD1 proteins remaining after siRNA transfection, 
as analysed by western blotting (n = 3). f, shScr and shSAM HEK293T 


cells were transfected with siRNA against RECQ1 (siRECQ1) for 48 h. 
Expression of IFNB mRNA was quantified by qRT-PCR. Data shown are 
representative of three independent experiments. Error bars denote s.d. 

of triplicates. g, HeLa cells were transfected with siRNAs as indicated. 
They were cultured in the absence or presence of 4 mM HU for 8h and 
then without HU for a total of 20 h before mRNA extraction. Expression 
of IFNA mRNA was analysed by qRT-PCR and normalized to GADPH 
mRNA. Data shown are representative of three independent experiments. 
Error bars denote s.d. of triplicates. h, shScr and shSAM HEK293T cells 
were transfected with siRNA against RECQI for 48 h. The cells were 
sequentially labelled for 15 min with IdU and for 15 min with CldU. Then, 
they were either collected immediately or treated for 2 h with 4 mM HU 
before DNA fibre analysis. The lengths of the IdU and CldU tracks (n = 200) 
were plotted as the ratio of CldU to IdU. Median values are indicated in 
red. ****P < 0.0001, Mann-Whitney rank sum test. 
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Extended Data Fig. 10 | SAMHD1 depletion promotes accumulation of 
cytosolic nascent DNA under replication stress. a, SAMHD1-depleted 
HEK293T cells were labelled for 2 h with IdU and then for 2 h with CldU 
in the presence of 4 mM HU. They were then chased with thymidine 

for the indicated times and cytosolic IdU (red) and CldU (green) were 
visualized by confocal immunofluorescence microscopy. Representative 
images are shown. Scale bar, 5 jum. b, Quantification of the cytosolic 

IdU and CldU signals in a by using CellProfiler (n = 300). c, DNA fibre 
analysis showing the increasing length of CldU tracks in HU-treated 
SAMHD1-depleted HEK293T cells. Median track lengths (n = 190) are 


enveloppe 


indicated in red. d, Model of the role of SAMHD1 at stalled replication 
forks. In SAMHD1-proficient cells (top), phosphorylation of SAMHD1 by 
the cyclin A (CycA)-CDK contributes to the MRE11-dependent resection 
of stalled replication forks and activates the ATR-CHK1 pathway at RPA- 
coated ssDNA, together with the DNA repair enzyme TopBP1 and the 
9-1-1 (Rad9-Hus1-Rad1) complex. In SAMHD1-deficient cells (bottom), 
nascent DNA is displaced by the RECQ]1 helicase and cleaved by an 
endonuclease, such as MRE11. The resulting ssDNA fragments accumulate 
in the cytosol and activate the type I IFN response. 
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Cryo-EM structure of the gasdermin A3 


membrane pore 


Jianbin Ruan!, Shiyu Xia!?, Xing Liu!, Judy Lieberman!? & Hao Wu!?* 


Gasdermins mediate inflammatory cell death after cleavage by caspases or other, unknown enzymes. The cleaved 
N-terminal fragments bind to acidic membrane lipids to form pores, but the mechanism of pore formation remains 
unresolved. Here we present the cryo- electron microscopy structures of the 27- fold and 28-fold single-ring pores formed 
by the N-terminal fragment of mouse GSDMA3 (GSDMA3- NT) at 3.8 and 4.2 resolutions, and of a double-ring pore 
at 4.6 A resolution. In the 27-fold pore, a 108-stranded anti-parallel $-barrel is formed by two 8-hairpins from each 
subunit capped by a globular domain. We identify a positively charged helix that interacts with the acidic lipid cardiolipin. 

GSDMA3-NT undergoes radical conformational changes upon membrane insertion to form long, membrane-spanning 
B-strands. We also observe an unexpected additional symmetric ring of GIDMA3-NT subunits that does not insert into the 
membrane in the double-ring pore, which may represent a pre-pore state of GSDMA3-NT. These structures provide a basis 
that explains the activities of several mutant gasdermins, including defective mutants that are associated with cancer. 


The gasdermin (GSDM) family, expressed in the skin, mucosa and 
immune antigen-presenting cells, triggers inflammatory programmed 
cell death (pyroptosis) and inflammatory cytokine secretion'®. There 
are six human GSDMs (GSDMA, GSDMB, GSDMC, GSDMD, GSDME 
(also known as DFNA5) and DFNBS59 (also known as pejvakin)), and 
ten in mice, including three GIDMAs. GSDMs are cleaved by regu- 
lated processing that removes an inhibitory C-terminal fragment 
(GSDM-CT) to allow the N-terminal fragment (GSDM-NT) to bind 
to acidic lipids in the inner leaflet of mammalian cell membranes or on 
bacterial membranes to form pores. GSDMD is a substrate of inflam- 
matory caspases!®, which are activated by inflammasomes that recog- 
nize invasive infection or intracellular danger signals”!°, and GIDME 
is activated by caspase-3!!. The stimuli and proteases that activate the 
other GSDMs are largely unknown. To elucidate the mechanism of 
GSDM pore formation, we determined the cryo-electron microscopy 
(cryo-EM) structure of the mouse GIDMA3-NT pore. 


Cryo-EM structure determination 

We formed human GSDMD-NT and mouse GSDMA3-NT pores 
by cleaving the full-length proteins in the presence of phosphati- 
dylserine-containing and cardiolipin-containing liposomes, 
respectively (Fig. la, Extended Data Fig. la, b). A detergent screen 
identified sodium cholate and C12E8 as suitable solubilizing agents 
for GDDMA3-NT and GSDMD-NT pores, respectively. Because 
GSDMA3-NT pores were more homogeneous in size and shape than 
GSDMD-NT pores (Extended Data Fig. 1c, d), we collected cryo-EM 
data from native GSDMA3-NT pores using a Talos Arctica microscope 
(Extended Data Fig. le), and from pores treated with HgCl2 using a 
Titan Krios microscope (Extended Data Fig. 1f). HgCl, treatment was 
intended to label free Cys residues for validation of sequence assign- 
ment to the cryo-EM map. 

For both datasets, top views of two-dimensional (2D) classified and 
averaged images showed mostly 27-fold symmetry (around 62% for 
native pores and around 70% for HgCl)-treated pores), but there were 
also substantial numbers of pores with 26- (around 16% for native 
pores) and 28-fold (around 22% for native pores and around 30% for 


HgCl,-treated pores) symmetry, implying heterogeneity of oligomer- 
ization (Fig. 1b-e). For the native dataset, top-view classes of pores 
with 27-fold symmetry and all side views were used to perform three- 
dimensional (3D) classification without imposing any symmetry. One 
major class was further 3D-refined with 27-fold symmetry to a global 
resolution of 4.6 A by gold-standard Fourier shell correlation (FSC) in 
Relion”” (Fig. 1c, Extended Data Fig. 2a, Extended Data Table 1). For 
the HgCl-treated dataset, two rounds of 3D classification followed by 
3D refinement led to maps at 3.8 A and 4.2 A resolution for the 27-fold- 
and 28-fold-symmetry pores, respectively (Fig. 1d, e, Extended Data 
Fig. 2b, c, Extended Data Table 1). Unexpectedly, the 4.6 A cryo-EM 
map from native pores contains two rings with 27-fold-symmetry, with 
the top ring representing a membrane pore and the bottom ring with 
no membrane insertion (Fig. 1c). By contrast, the cryo-EM maps from 
HgCl,-treated pores contain only the top ring (Fig. le). Local resolu- 
tion distribution calculated in ResMap'? showed better resolution at 
the putative membrane-inserted region, which contained extended B- 
hairpins, in comparison to the juxtamembrane globular domain 
(Extended Data Fig. 2d, e). 


Overall structure of the membrane pore 

Analysis of the cryo-EM maps of the 27-fold membrane pore revealed 
a very large structure with an inner diameter of around 180 A, an outer 
diameter of around 280A and a height of around 70 A. We focus our 
discussion on the 3.8 A-resolution pore, which had clearly defined sec- 
ondary structures and large side chains (Fig. 2a, Extended Data Fig. 3, 
Supplementary Video 1). On the basis of the top views of 2D averages 
(Fig. 1b, d), the 26- and 28-fold pores would have similar dimensions 
that are only marginally smaller and larger, respectively. However, 
previously reported GSDMA3 pores assembled on cardiolipin- 
containing monolayer membranes appear to have 16-fold symmetry 
and a smaller interior width of 100-140 A, despite the similar outer 
diameter of around 30 nm, as determined from 2D averages in nega- 
tive-staining electron microscopy’. Since native GSDMs assemble on 
lipid bilayers, which are present in our liposomes, rather than mon- 
olayers, and the determination of symmetry and dimensions from 
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Fig. 1 | Structure determination of 
GSDMA3 pores. a, Cartoon diagram of 
human GSDMD and mouse GSDMA3 
constructs used for in vitro pore 
reconstitution. b, 2D classification of 
GSDMAS3 pores showing the presence of 
26-, 27- and 28-fold symmetric pores. 
Each image shows an area of 400 x 400 A?. 
c, 3D reconstruction of the 27-fold symmetric 
double-ring pore to 4.6 A resolution. 

d, 2D classes of HgCl)-treated GIDMA3 
pores showing mostly single rings and 

a minor population of double rings. 

e. 3D reconstruction of 27- and 28-fold 
symmetric single-ring GSDMA3 pores to 
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low-resolution averages may be inaccurate, the structure we obtained 
here is more likely to resemble the architecture of the pore in the plasma 
membrane. 

The predominant GSDMA3 pore with 27-fold symmetry is a com- 
plete anti-parallel 8-barrel composed of 108 8-strands (Fig. 2a). Each 
subunit in the pore contributes four clearly separated long B-strands 
as two anti-parallel 8-hairpins that align in a manner similar to the 


4.2 A resolution 


digits of a left hand to form the transmembrane region (Fig. 2a, b). The 
6-strands that line the pore are from 15 to 22 residues long, and define 
a central B-barrel about 50 A high, sufficient to traverse a lipid bilayer 
composed of lipid acyl chains with a thickness of about 30-40 A. A 
previous analysis of transmembrane 3-strands suggested that 6-strands 
of between 6 and 22 residues in length can span a membrane“. The 
GSDMA3-NT globular region, or the ‘paln’ of the hand, shapes the rim 
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Fig. 2 | Structure and conformational changes of the GSDMA3 pore. 
a, Cryo-EM map (grey) superimposed onto the atomic model of the 
27-fold symmetric GSDMA3 pore at 3.8 A resolution. b, Ribbon diagram 
of GIDMA3-NT in the pore conformation (left) and crystal structure of 


NATURE|www.nature.com/nature 


GSDMA3-NT 
auto-inhibited 
GSDMA3-NT 
pore form 


auto-inhibited GSDMA3} (right). c. Superposition of the auto-inhibited 
form and the pore form of GIDMA3-NT. d, Structural transitions that 
accompany the formation of the two 8-hairpins HP1 and HP? in the pore 
conformation. ED1 and ED2, extension domains 1 and 2, respectively. 
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Fig. 3 | Mechanism of lipid recognition. a, Charge distribution in 

the GSDMA3 pore, viewed from the membrane side (top) and the 

pore interior (bottom). TM, transmembrane. b, Chemical structure of 
cardiolipin showing two phosphatidic acid moieties in the head group. The 
four R groups represent C;7H33. c, Cryo-EM map of the GSDMA3 pore 
(grey) with the difference density map for potential bound lipid (green). 

d, Cardiolipin head group fitted into the lipid density in proximity to basic 
residues in the a1 helix. e, Potential lipid-binding patch. Left and middle, 
the auto-inhibited GSDMA3 structure!, shown in electrostatic surface 

for the N-terminal domain and grey ribbon for the C-terminal domain 


of the pore and contains both a- and 3-elements, with the prominent 
al helix acting as the thumb (Fig. 2b). 


Conformational transition upon membrane insertion 

The conformation of the GIDMA3-NT pore exhibits a radical confor- 
mational change in comparison to its structure in the auto-inhibited, 
uncleaved protein (Protein Data Bank (PDB) ID: 5B5R)! (Fig. 2b-d). 
We named the secondary structures in the pore form on the basis of 
the auto-inhibited structure to facilitate comparison (Extended Data 
Fig. 4). Superposition of the two structures resulted in a root-mean- 
square deviation (1.m.s.d.) of 4.95 A for 161 aligned Ca atoms (Fig. 2c). 
Whereas the globular domain is largely unaltered, with a superimposed 
r.m.s.d. of 0.95 A, marked conformational changes are associated with 
the formation of the two membrane-inserted 6-hairpins. The entire 
83-84-85 region extends into the first tight transmembrane (3-hair- 
pin, and the «3/ and the disordered region preceding it in the auto-in- 
hibited conformation are pulled into the new 83 strand (Fig. 2c, d, 
Supplementary Video 2). The entire 87-a4-(8 region stretches out into 
the second transmembrane £-hairpin, including the associated loops 
and disordered segments in the auto-inhibited structure (Fig. 2c, d, 
Supplementary Video 2). The «4 helix that binds to the C-terminal 
fragment (GSDMA3-CT) in the uncleaved structure (Fig. 2d) forms 
part of the tip of the second ($-hairpin. Because of the structural 
transformation, we named the 84 and 3’ region ‘extension domain 
1 (ED1), the a4 region ‘extension domain 2’ (ED2), and the newly 
formed long B-strands ‘hairpin 1’ (HP1) and ‘hairpin 2’ (HP2) (Fig. 2d). 


Time (min) Time (min) Time (min) 


(left), and in cyan ribbon for the N-terminal domain and electrostatic 
surface for the C-terminal domain (middle). Right, the N-terminal 
domain in its auto-inhibited form shown as a surface diagram with the 
positively charged patches in blue. f, The basic patch in the pore form 
contains residues from helices a1 and «3 and the 81-82 region. Relevant 
residues are labelled for GIDMA3-NT (left) and for a model of human 
GSDMD-NT (right). g-k, Effects of GIDMA3 (g-i) and GDDMD 

(j, k) mutations on liposome-leakage activities, monitored by measuring 
dipicolinic acid (DPA)-chelating-induced fluorescence of released Tb?* 
ion (n =3 biological replicates). Error bars denote mean + s.d. 


Acidic lipid binding by the al helix 

Analysis of the electrostatic surface of the GIDMA3-NT pore revealed 
that in the transmembrane region, the side of the pore that faces the 
membrane is largely hydrophobic, as expected, whereas hydrophilic 
residues form positively and negatively charged stripes on the inner side 
of the pore conduit (Fig. 3a). A positively charged surface patch also 
forms adjacent to the proposed transmembrane region, in the globular 
domain of the pore (Fig. 3a). When we used a difference density map to 
locate additional densities, we found a strong density at 6.60 adjacent 
to the al helix, which fits well to the double phosphate head group of 
cardiolipin (Fig. 3b—d), the acidic lipid we used to reconstitute the pore 
on liposomes. We hypothesize that the modelled cardiolipin head group 
interacts with the basic residues on helix a1 (Fig. 3d). 

To confirm the role of helix a1 in lipid interaction, we inspected 
the membrane-proximal, positively charged surface of the GIDMA3 
pore, which is made up of basic residues from helix a1, helix a3 and 
strands 31 and 82 of the pore form of GIDMA3-NT (Fig. 3e, Extended 
Data Fig. 4). These regions are masked by GSDMA3-CT in the auto- 
inhibited full-length structure (Fig. 3e), consistent with binding of 
GSDM-NT, but not full-length GSDMA3 or GSDMA3-CT, to acidic 
lipids'*8. Mutation to Ala of a cluster of four conserved basic residues 
on helices «2 and a3 in mouse GIDMD (GSDMD(4A)) that corre- 
spond to R137, K145, R151 and R153 of human GSDMD (Fig. 3f, right) 
compromised lipid binding, liposome leakage and pyroptosis”. When 
these mutations were combined, the loss of both R151 and R153 on a3 
was the most defective combination, whereas the combined loss of R137 
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Fig. 4 | Mechanism of oligomerization and 
membrane insertion. 

a, Three contiguous subunits in the 

GSDMaAS3 pore. Residues that participate 

in oligomerization are labelled in yellow. b, 
Electrostatic potential map of aGSDMA3-NT 
subunit in two side views showing 

the oligomerization interface with three major 
patches labelled. 

c-e, Main residues on interface I (c), interface 
II (d) and interface III (e). f, Oligomerization 
mutant E14K exhibits compromised pore 


— GSDMA3 (WT) + 3C 
— GSDMA3-(E14K) + 3C formation compared to wild-type (WT) 
— GSDMAS (WT) alone GSDMA3 (n =3 biological replicates). Error 


bars denote mean +s.d. g, h, The locations 
of previously reported and cancer-associated 
mutations on both auto-inhibited (g) and 
pore (h) conformations on GSDMA3. hA, 
human GSDMA; hD, human GSDMD; mD, 
mouse GSDMD. i, Mutation of L186 in the 
membrane insertion region abolishes pore 
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and K145 only minimally impaired cell death”. Two of the four basic 
residues in GSDMD(4A) are conserved in GSDMA3 as R132 in a2 
and R145 in a3 (Fig. 3f, left). To test the importance of these residues, 
we generated R132A and R145A mutants of GIDMA3. Liposome 
leakage assays showed that the R132A mutation on a2 barely compro- 
mised pore formation (Fig. 3g), whereas the R145A mutation on a3 
or the R132A/R145A double mutation weakened liposome disruption 
(Fig. 3h, Extended Data Fig. 5a), consistent with the defective pheno- 
type of corresponding mutations on GSDMD?. 

To test the effect of the positively charged juxtamembrane patches 
formed from residues in helix a1 and the 61-2 region, we gener- 
ated a triple mutant of GIDMA3 containing R9A, R13A and R18A 
(GSDMA3(a1)), and a quadruple mutant containing R41A, K42K, 
R43A and R52A (GSDMA3(81-82)) (Fig. 3f, left). The triple mutant 
was expressed and could be cleaved by the 3C protease with similar 
kinetics to those of the wild-type protein (Extended Data Fig. 5b), 
suggesting that the mutation did not radically alter the structure of 
the full-length protein, whereas the quadruple mutant could not be 
evaluated because it was not expressed. GIDMA3-a cleaved by 3C 
protease had reduced membranolytic activity in the liposome leak- 
age assay (Fig. 3i). Corresponding Ala mutations were also introduced 
into human GSDMD at R7, R10 and R11 (GSDMD(a1)), and at R42, 
K43 and R53 (GSDMD(81-82)) (Fig. 3f, right). Both mutants were 
expressed and could be purified. Liposome leakage assays showed 
that similar to GSDMA3(a1), the GIDMD(qa1) mutant also showed 
compromised pore formation in comparison to the wild-type protein 
(Fig. 3j). The importance of the a1 helix is also supported by the lack of 
cell death when part of «1 is deleted*. By contrast, the GIDMD($1-82) 
mutation did not decrease liposome leakage (Fig. 3k). 

Thus, these mutational studies suggest that the a1 and 03 juxta- 
membrane basic patches, but not basic residues in the 81-32 region, 
are potential candidates for lipid binding. Because «3 also participates 
in oligomerization between subunits (see below), and is further away 
from the membrane than al, we argue that the a1 helix is likely to be 
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formation and liposome leakage (n =3 
biological replicates). Error bars denote 
mean +s.d. 
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the major lipid-binding site for GSDM pore formation. The difference 
density near «1 may represent the acidic head group of cardiolipin, 
which may interact strongly with GSDMA3-NT and stay associated 
with the pore after detergent extraction. We do not know whether the 
acidic lipid interaction is only important for inducing conformational 
changes in GSDM or whether it is also essential for stabilizing the mem- 
brane pore. 


Oligomerization interface 

Analysis of the oligomerization interface of the GIDMA3-NT pore 
revealed extensive interactions of both the inserted 3-strands and the 
associated globular domains (Fig. 4a). The 3-strands have a tilt angle of 
about 20° relative to the pore height direction. If one draws the B-strand 
hydrogen bonding pattern across the 3-barrel, there is a two-residue 
shift from one subunit to the next subunit (Extended Data Fig. 6), giving 
rise to a total displacement of residues, or a shear number!®, of 
27 x 2=54 for the entire 27-fold symmetric pore. Each subunit inter- 
face buries a total of 1,600 A? surface area, with key interaction areas 
dividable into three patches, I, II and III (Fig. 4b, Extended Data Fig. 4). 
In the first patch, residues between the neighbouring globular domains 
form both hydrophobic and charged interactions (Fig. 4c). The inter- 
action contains mainly residues from helix «3 of one subunit and the 
region around a2 and 611 of its neighbouring subunit. In the second 
patch, the «1 helix from one subunit juxtaposes end-on with the al 
helix from the next subunit through hydrogen-bonding and hydropho- 
bic interactions (Fig. 4d). The third major subunit interface runs along 
the neighbouring $3 and 88 strands between the subunits (Fig. 4e). 
Notably, in the inserted 3-strands, residues facing the membrane are 
hydrophobic, while those facing the pore are mostly hydrophilic or 
charged (Fig. 4e). 

Previous studies showed that E15K and L192D mutations of 
GSDMD-NT, and the corresponding mutations E14K and L184D of 
GSDMA3-NT (Extended Data Fig. 4), strongly reduced pyroptosis’. 
Consistent with these results, we found that the E14K mutant of 
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Fig. 5 | Comparison with MACPF/CDC proteins. a, b, Superposition 

of GSDMA3-NT with the membrane insertion domain of pneumolysin 

in the soluble form (a) and in the pore conformation (b), showing the 
alignment of the central 3-sheets. The C-terminal tail of GIDMA3-NT 
that links to GSDMA3-CT and part of the pneumolysin structure are 
omitted for clarity. c, d, Different structural topologies of GIDMA3-NT 
(c) and pneumolysin (d) in both soluble and pore forms. ED, extension 
domain; HB, helix bundle; HP, 8-hairpin. Regions that are straightened by 
the insertion are highlighted in red. 


GSDMA3 compromised liposome leakage in vitro (Fig. 4f). The 
inhibitory effects of E14 and L184 mutations on GSDMA3 cannot 
be explained by its full-length structure because E14 is close to the 
N-terminal domain—C-terminal domain interface and L184 interacts 
closely with the C-terminal domain (Fig. 4g), and mutations in the 
C-terminal domain near E14 or L184 constitutively activate GSDMA3 
and GSDMD!*. However, in the GIDMA3-NT pore structure, E14 
and L184 of GIDMA3 are involved in oligomerization at the globular 
domain and the B-barrel domain, respectively (Fig. 4d, e, g, h); this 
explains how these mutations compromise pore formation, even 
though they reduce auto-inhibition. L184 is also involved in membrane 
insertion and its mutation to an acidic residue could therefore also 
directly affect this process. To further test the structural model, we 
mutated another oligomerization or insertion residue L186 to obtain 
the L186E mutant. As we predicted, the mutation almost completely 
eliminated the liposome leakage activity of proteolytically cleaved 
GSDMA3 (Fig. 4i). 

A forward genetic screen with randomly mutated mice identified an 
I105N mutation of mouse GSDMD as defective in the intracellular LPS 
response in vivo, and in membrane permeabilization in vitro”*®. The 
I105N mutation in mouse GSDMD corresponds to I104N in human 
GSDMD and V101N in GSDMA3 (Extended Data Fig. 4). V101 local- 
izes at 85 of the pore form and is involved in membrane insertion 
(Fig. 4h), explaining the impaired function in the mutant GSDMD. 
The OASIS cancer genome site (http://www.oasis-genomics.org/) lists 
a number of point mutations in human GSDMA and GSDMD from 
patient samples, including mutations in GSDMA that correspond to 
G103E and N192D in GSDMA3, and GSDMD mutations E15K, E15Q, 
P99L, S117L and S210L that correspond to mutations on residues E14, 
T98, T112 and K200 in GSDMA3 (Fig. 4d, g, h). E14 of GIDMA3 is at 
the oligomerization interface of the pore globular domains, whereas 
the remaining residues are all involved in membrane insertion and/or 
oligomerization at the 6-barrel domain (Fig. 4h), suggesting a poten- 
tial mechanism of tumour potentiation through reduced pyroptotic 
activity of GSDMs. 
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Fig. 6 | A possible GSDMA3 pre-pore conformation. a, b, Model of the 
additional ring (a) and its cryo-EM density in the double-ring GSDMA3 
pore architecture (b). c, Cryo-EM density of an individual subunit 
superimposed on the model. d, e, Superposition of the subunit structure 
in the additional ring with that of GIDMA3-NT in the auto-inhibited 
conformation (pink)! (d) and the pore conformation (cyan) (e). f, A 
hypothetical model for GSDM pore formation. Upon cleavage by an 
activating enzyme, GIDM-NT monomers may first bind to membrane 
lipids and then oligomerize to form a soluble pre-pore before membrane 
insertion. 


Comparison with cytolysins 
The large 3-barrel architecture of the GSDMA3 pore described here 
shares mechanistic similarities with pores formed by bacterial cyto- 
lysins, complement proteins and the pore-forming protein perforin 
from cytotoxic T lymphocytes, which belong to the ‘membrane attack 
complex perforin-like’/‘cholesterol dependent cytolysin (MACPF/ 
CDC) family'®®. Whereas many structures in the family have been 
determined in their soluble forms, the only high-resolution structure 
of the pore form is that of pneumolysin!*°. A DALI search?! using 
the GSDMA3-NT domain structures in both the full-length context 
and the pore form identified weak structural homology at the central 
6-sheets to these proteins, including the soluble form and the pore form 
of pneumolysin'®”° (Fig. 5a, b), with the highest structural homology 
to perfringolysin O, with a r.m.s.d. of 7.3 A for 134 aligned residues. 
Despite the structural similarity at the central 3-sheets, we hypothesize 
that the GDM and MACPE/CDC families either evolved indepen- 
dently or are so distantly related that divergent evolution at the 
sequence level and unexpected conserved structural features are not 
detected. We present three major differences that support this idea. 
First, previous analysis showed that the MACPF and CDC family 
proteins contain three conserved glycine residues”, which are absent 
in GSDMs (Extended Data Fig. 7). Second, the structural homology 
between MACPF and CDC spreads from the central 3-sheets to the 
clusters of a-helices that switch to the membrane-spanning 3-confor- 
mation, yielding much lower r.m.s.d. for many more aligned residues 
than with GIDMA3~. Third, the membrane insertion mechanism is 
different. For GSDMA3, one 6-hairpin is formed from straightening the 
87 and (8 strands into the connected a4 and disordered loops, while 
the other hairpin comes from both the elements in between (3 and 85, 
and elements preceding 83 that include the a3’ helix and a disordered 
loop in the auto-inhibited conformation (Fig. 5c). By contrast, in 
pneumolysin, straightening into the connected cluster of helices from 
the central 3-sheet alone is responsible for the formation of the long 
8-hairpins for membrane insertion and 6-barrel formation’? (Fig. 5d, 
Supplementary Video 3). 
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The soluble pore may represent a pre-pore 

The native cryo-EM map of GIDMA3-NT shows the existence of an 
additional pore, creating a yo-yo-shaped double-ring pore (Figs. 1c, 
6a, b). Unlike the membrane pore, the bottom ring does not insert 
into the membrane, and we therefore call it the soluble pore. We do 
not know the physiological significance of the double-ring structure. 
On the other hand, most HgCl,-treated GSDMA3 pores contain only 
the membrane-inserted ring (Fig. le), suggesting that the double-ring 
structure is sensitive to the biochemical condition, and could be an 
artefact of in vitro reconstitution. 

Although the cryo-EM density for the additional ring is much poorer, 
we could fit the N-terminal part of the full-length auto-inhibited 
GSDMA3 structure’ rigidly into the density (Fig. 6c). Consistent with 
lack of membrane insertion, most of the B-strands in the transmembrane 
region of the pore form, as well as the N-terminal-C-terminal linker that 
harbours the 3C cleavage site, are disordered in this fitted GIDMA3-NT 
structure (Fig. 6d, e). Although detailed analysis of the GGDMA3-NT 
conformation and its subunit association in the soluble pore would 
require improved resolution of this region, the existence of soluble 
pre-pore conformations for some MACPF/CDC family members’? 
suggests that the soluble GIDMA3-NT pore could represent a pre-pore 
state. Notably, a mutant of aerolysin forms asymmetric pores with a 
stacked quasi-pore and a soluble pore? that resemble our double-ring 
native GSMDA3 pore structure. 


Summary 

The cryo-EM structure of the GIDMA3-NT pore revealed that the 
N-terminal fragment of GSDM undergoes marked conformational 
changes to form the pore structure upon binding to acidic inner leaflet 
or bacterial lipids. Difference density and structure-based mutagenesis 
suggested that helix a1 is the major site of acidic lipid interaction, 
which is required for pore formation. We do not know the function of 
the second ring or whether it is part of the physiological pore in cells 
or an artefact of in vitro reconstitution; however, we speculate that it 
may represent a pre-pore structure that precedes membrane insertion 
(Fig. 6f). 

In some circumstances, membrane damage caused by bacterial 
cytolysins, complement proteins or perforin can be repaired by the 
ubiquitous plasma membrane repair pathway, which is triggered by a 
calcium influx from the extracellular milieu into the damaged cell?>”*, 
In the case of perforin, repair enables target cells to die by apoptosis 
rather than necrosis, which occurs when plasma membrane damage is 
not repaired. Recent reports have described situations in which inflam- 
masome activation and GSDMD cleavage lead to IL-1 release without 
cell death’, suggesting that damage to the cell membrane caused by 
the GSIDMD-NT pore can also be repaired under some circumstances. 
If the double-ring pore exists physiologically, it is possible that under 
these conditions the second ring could affect whether repair occurs or 
might even have a signalling function in these activated cells. 
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METHODS 

Cloning, expression and purification of recombinant mouse GSDMA3 and 
human GSDMD. Full-length mouse GSMDA3 sequence was cloned into a modified 
pET28a vector with an N-terminal 6 x His-SUMO tag, followed by mutagenesis 
to insert a human rhinovirus 3C protease recognition sequence (LEVLFQGP) 
immediately after residue E262. The human GSDMD sequence was cloned into 
the pDB.His.MBP vector to append an N-terminal 6 x His-MBP tag followed by 
a tobacco etch virus (TEV) cleavage site. All mutagenesis was performed using 
the QuikChange Site-Directed Mutagenesis Kit (Stratagene). All plasmids were 
verified by DNA sequencing. 

To obtain recombinant GSDMA3, Escherichia coli BL21 (DE3) cells were trans- 

fected with the vector and grown in LB medium supplemented with 50 j1g/ml 
kanamycin. Protein expression was induced overnight at 18°C with 0.5mM 
isopropyl-3-p-thiogalactopyranoside after optical density at 600 nm reached 0.8. 
Cells were harvested and resuspended in a lysis buffer containing 25 mM Tris-HCl at 
pH 8.0 and 150mM NaCl. The protein was first purified by affinity chromatography 
using Ni-NTA beads (Qiagen) and the 6x His-SUMO tag was removed by over- 
night Ulp1 protease digestion at 4°C. The cleaved GSDMA3 was further purified 
by HiTrap Q ion-exchange chromatography and Superdex 200 gel filtration chro- 
matography (GE Healthcare Life Sciences). Human GSMSD was expressed and 
purified with similar procedures, except that the N-terminal 6 x His-MBP tag was 
removed by overnight TEV cleavage. 
Reconstitution of GSDMA3 and GSDMD pores. 1-Palmitoyl-2-oleoyl-sn- 
glycero-3-phosphocholine (POPC) and 1’, 3’-bis[1,2-dioleoyl-sn-glycero-3- 
phospho]-sn-glycerol (sodium salt) (cardiolipin) (Avanti Polar Lipids) dissolved 
in chloroform were mixed in a glass tube at a mass ratio of 3:1, and the solvent was 
evaporated under a stream of N> gas. Buffer composed of 25 mM Tris-HCl at pH 
8.0 and 150mM NaCl was added to yield a final lipid concentration of 10 mg/ml. 
Lipid suspension was then vortexed continuously for 5 min. To obtain unilamellar 
vesicles, liposomes were extruded through a mini-extruder device (Avanti) for 
21 passes. 

To form GSDMA3 pores on liposomes, purified GSDMA3 was incubated with 
3C protease and POPC-cardiolipin liposomes for 4h on ice. Liposomes were 
harvested by ultracentrifugation at 60,000g at 4°C for 30 min and resuspended in 
0.5 ml lysis buffer containing 50 mM sodium cholate. After centrifugation at the 
maximum speed of an Eppendorf centrifuge for 30 min, the supernatant containing 
the solubilized pores was further purified over a Superose 6 gel-filtration column 
(GE Healthcare Life Sciences) equilibrated with lysis buffer containing 15 mM 
sodium cholate. GSDMD pores were formed in the same way but with recombinant 
caspase-11 in liposomes containing 20% (w/w) phosphatidylserine, and 2% C12E8 
was used to solubilize the pores. 

Negative staining electron microscopy. For negative staining, 10 ul GIDMA3 
or GSDMD pores was placed onto a glow-discharged copper grid (Electron 
Microscopy Sciences) coated with a layer of thin carbon, washed twice with H20, 
stained with 2% uranyl formate for 40s and air-dried. The grids were imaged on 
the Tecnai G? Spirit BioT WIN electron microscope and recorded with an AMT 2k 
CCD camera (Harvard Medical School Electron Microscopy Facility). 

Cryo-EM data collection. A 3-11 drop of native GIDMA3 pores at 2 mg/ml was 
applied to a glow-discharged Quantifoil grid (R 1.2/1.3 400 mesh, copper, Electron 
Microscopy Sciences), blotted for 3s in 100% humidity at 4°C and plunged into 
liquid ethane using an FEI Vitrobot Mark IV. The cryo-grids were imaged in an 
FEI Talos Arctica microscope operating at an acceleration voltage of 200kV and 
equipped with a cryo-autoloader (University of Massachusetts Cryo-EM Core 
Facility). Cryo-EM data were collected automatically on a K2 Summit direct detec- 
tor camera (Gatan) in super-resolution counting mode, with 8.0s total exposure 
time and 200 ms per frame. This resulted in movies each containing 40 frames and 
an accumulated dose of 41.1 electrons per A?. The super-resolution pixel size is 
0.5843 A. The defocus level in the data collection was set in the range of —1.0 to 
—3.0m. A total of 3,010 movies was collected. 

For mercury treatment to label free Cys residues, GSDMA3 pores at 2 mg/ml 
(~75 1M) were incubated with MgCl, at a molar ratio of 1:10 at 4°C for 2h. The 
cryo-grids were plunged in the same way as for the native pores. A total of 1,583 
movies was recorded using an FEI Titan Krios electron microscope (National 
Cryo-electron Microscopy Facility, National Cancer Institute) operating at 300kV 
on a K2 Summit direct electron camera operating in super-resolution counting 
mode, with 12s total exposure and 300 ms per frame. The defocus range was from 
—0.75 to —2.0j.m. The resulting movies each contain 40 frames and an accumu- 
lated dose of 40.0 electrons per A”. The super-resolution pixel size is 0.66 A. 
Image processing. For data of native GIDMA3 pores, raw movies were corrected 
by gain reference and for beam-induced motion, and summed into motion- 
corrected images using MotionCor2”*. The CTFFIND4 program was used to 
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determine the actual defocus level of each micrograph”. We initially selected 
462,878 particles from the micrographs with a combination of manual and auto- 
matic particle picking in RELION”. The picked particles were binned two times 
(pixel size 1.1686 A) and then subjected to reference-free 2D classification. Good 
class averages in different orientations were selected for the reconstruction of an 
initial model in RELION. The resulting initial model, low-pass filtered to 60.0 A, 
was used as the input reference to conduct unsupervised 3D classifications in 
RELION without assumption of any symmetry. The particle set used contained all 
the side views and top views with apparent C27 symmetry, but not top views with 
apparent C26 or C28 symmetry. A best 3D class showing symmetric features was 
selected for further refinement. Upon imposing C27 symmetry, the reconstructions 
reached a resolution of 4.6 A, as measured by FSC of a gold-standard refinement in 
which two halves of the dataset were refined separately and combined only when 
building the final map”. 

For mercury-soaked GSDMA3 pores, the same procedures were used to process 
the dataset but two rounds of 3D classification were performed after 2D classifi- 
cation. In brief, 446,553 particles were selected by automatic particle picking in 
RELION”. Next, 107,564 particles containing good 2D classes of all side views, top 
views of C27 and C28 classes, and top views of classes with uncertain symmetry 
were selected for the first round of 3D classification using an angular sampling 
angle of 7.5° without applying symmetry. This 3D classification yielded four classes, 
of which two had apparent C27 and C28 symmetry, respectively. Subsequent 3D 
classification was performed using an angular sampling angle of 3.7° for both the 
C27 and C28 classes. From a 3D class of C27-fold symmetry, 40,086 particles were 
selected for auto-refinement by imposing symmetry in RELION”. For the C28- 
fold symmetry pore, 16,581 particles from the 3D class with the best resolution 
were extracted and refined using SPHIRE™. Final resolutions for C27 and C28 
pores are 3.8 A and 4.2 A, respectively. Local resolution distribution of all maps 
was determined by ResMap’’. 

Model building and analysis. The 3.8 A map with 27-fold symmetry from the 
mercury-soaked data set and the crystal structure of full length GSDMA3 (PDB 
ID: 5B5R) were used for model building. The model of the N-terminal domain 
was docked into the map as a rigid body in Chimera*!. The fitted model accounted 
well for most of the density in the globular domain but required extensive remod- 
elling in the 8-barrel region using Coot*”. PHENIX was used to refine the model 
against the cryo-EM density in real space and to ensure proper geometry*. All 
structural and density representations were generated using either Chimera*! or 
Pymol (https://www.pymol.org). Model versus map FSC curves were calculated 
using Phenix.mtriage to obtain an estimate of the final resolution for the models. 
Liposome leakage assay. The leakage of liposomes encapsulating TbC]; was deter- 
mined by an increase in fluorescence intensity when Tb** bound to dipicolinic 
acid (DPA) in the external buffer. Tb**-entrapped liposomes were prepared using 
a similar procedure as detailed in ‘Reconstitution of GIDMA3 and GSDMD pores, 
with the exception that the buffer contained 20 mM HEPES at pH 7.4, 150mM 
NaCl, 50mM sodium citrate and 15 mM TbCl. The suspension was loaded onto 
a Superose 6 gel filtration column to remove unincorporated TbCl3. Purified 
liposomes were supplemented with 50|1.M DPA before addition of a recombinant 
GSDM protein (0.51M) and caspase-11 or 3C protease (0.2 1M). Fluorescence at 
545 nm after excitation at 276 nm was recorded continuously for 45 min at 30-s 
intervals using a Molecular Devices SpectraMax M5 plate reader. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. The cryo-EM maps have been deposited in the Electron 
Microscopy Data Bank under accession numbers EMD-7449 (HgClp-treated, C27 
membrane pore), EMD-7450 (HgCl-treated, C28 membrane pore) and EMD7451 
(native, double-ring pore). The atomic structure coordinates have been deposited 
in the Protein Data Bank under the accession number 6CB8. All other data can be 
obtained from the corresponding author upon reasonable request. 
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Extended Data Fig. 1 | Reconstitution of GSDMA3 pores in vitro. d, Representative negative-stain electron microscopy images of human 
a, A general procedure for GSDMA3 pore reconstitution. b, Size-exclusion | GSDMD pores with double-rimmed side views. e, f, Representative 
chromatography of GIDMA3 pores extracted from liposomes (top) cryo-EM images of GSDMA3 pores with double-rimmed side views (e) 
and Coomassie blue-stained SDS-PAGE of the collected fractions. and HgCl)-treated GIDMA3 pores (f). Scale bar, 50 nm (c-f). All results 
c, Representative negative-stain electron microscopy images of GIDMA3. were confirmed at least three times as technical replicates. 
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Extended Data Fig. 2 | Cryo-EM analysis of double- and single-ring 


by ResMap on the two half maps separately refined in a gold-standard 
GSDMAS3 pores. a-c, Gold-standard FSC plots from two 


procedure in RELION. Local resolutions are colour-coded on the densities. 
half-reconstructions refined separately in RELION for the 27-fold Highest resolution is observed at the B-barrel domain for both the 


double-ring pore map (a), the 27-fold single-ring pore map (b) and the single-ring pore (d) and the membrane-inserted ring of the double-ring 


28-fold single-ring pore map (c). Model-to-map correlations are shown for _ pore (e). The globular domains and the additional ring exhibit relatively 
the 27-fold single-ring pore (b). d, e, Local resolution estimation generated _low resolution. 
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Extended Data Fig. 3 | Cryo-EM density validation. Close-up views of GDSMA3 subunit model fitted into the cryo-EM density map at four locations 
denoted by residue numbers. 
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Extended Data Fig. 5 | Lipid binding by GSDM. a, Effect of the GIDMA3 
R132A/R145A mutation on the liposome-leakage activity, monitored 
by measuring DPA-chelating-induced fluorescence of released Tb?* 
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Extended Data Fig. 6 | Two-residue shift in the hydrogen-bonding pattern in each GSDMA3 subunit. The shift results in a shear number of 
27 x 2=54 for a 27-fold symmetric pore. 
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Extended Data Table 1 | Cryo-EM data collection, refinement and validation statistics 


#1 Hg-treated, C27-symmetry 
(EMDB-7449) 


#2 Hg-treated, C28-symmetry 
(EMDB-7450) 


#3 Native, C27-symmetry 
Stacked double pore 


(PDB 6CB8) (EMDB-7451) 

Data collection and processing 
Voltage (kV) 300 300 200 
Electron exposure (e—/A2) 40.0 40.0 41.1 
Defocus range (um) -0.75 to -2.0 -0.75 to -2.0 -0.1 to -3.0 
Super-resolution pixel size (A) 0.66 0.66 0.5843 
Symmetry imposed C27 C28 C27 
Initial particle images (no.) 446,553 446,553 462,878 
Final particle images (no.) 40,086 16,581 64,248 
Map resolution (A) 38 42 46 

FSC threshold 0.143 0.143 0.143 
Map resolution range (A) 60.0-3.8 60.0-4.2 60.0-4.6 
Refinement 
Initial model used (PDB code) 5B5R 
Model resolution (A) 40 

FSC threshold 0.5 
Model resolution range (A) 84.5-3.8 
Map sharpening B factor (A2) -131.255 
Model composition 

Non-hydrogen atoms 1,844 

Protein residues 230 

Ligands 14 
B factors (A2) 

Protein 101.86 

Ligand 92.69 
R.m.s. deviations 

Bond lengths (A) 0.009 

Bond angles (°) 1.122 


Validation 
MolProbity score 
Clashscore 
Poor rotamers (%) 


2.46 (99th percentile)* 
16.22 (97th percentile)* 
0.00 


Ramachandran plot 
Favored (%) 
Allowed (%) 
Disallowed (%) 


86.28 
13.72 
0.00 


*100th is the best among the structures of comparable resolution, Oth is the worst. 
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Helium in the eroding atmosphere of an exoplanet 


J.J. Spake!*, D. K. Sing!, T. M. Evans!, A. Oklopéié?, V. Bourrier*, L. Kreidberg*”®, B. V. Rackham’, J. Irwin®, D. Ehrenreich’, 
A. Wyttenbach’, H. R. Wakeford®, Y. Zhou’, K. L. Chubb’, N. Nikolov!, J. M. Goyal, G. W. Henry", M. H. Williamson", 
S. Blumenthal', D. R. Anderson", C. Hellier", D. Charbonneau’, S. Udry* & N. Madhusudhan” 


Helium is the second-most abundant element in the Universe after 
hydrogen and is one of the main constituents of gas-giant planets 
in our Solar System. Early theoretical models predicted helium to 
be among the most readily detectable species in the atmospheres 
of exoplanets, especially in extended and escaping atmospheres’. 
Searches for helium, however, have hitherto been unsuccessful’. 
Here we report observations of helium on an exoplanet, at a 
confidence level of 4.5 standard deviations. We measured the near- 
infrared transmission spectrum of the warm gas giant? WASP-107b 
and identified the narrow absorption feature of excited metastable 
helium at 10,833 angstroms. The amplitude of the feature, in transit 
depth, is 0.049 + 0.011 per cent in a bandpass of 98 angstroms, 
which is more than five times greater than what could be caused 
by nominal stellar chromospheric activity. This large absorption 
signal suggests that WASP-107b has an extended atmosphere that is 
eroding at a total rate of 10’° to 3 x 10"! grams per second (0.1-4 per 
cent of its total mass per billion years), and may have a comet-like 
tail of gas shaped by radiation pressure. 

WASP-107b is one of the lowest-density planets known, with a 
radius (0.94+0.02)R, similar to that of Jupiter but a much lower 
mass’, (0.12 +0.01)My; Ry and Mj are the radius and mass of Jupiter, 
respectively. It orbits an active K6 dwarf every 5.7 days at a distance of 
0.055 + 0.001 astronomical units. On 31 May 2017, we observed a pri- 
mary transit of WASP-107b with Wide Field Camera 3 (WFC3), which 
is onboard the Hubble Space Telescope (HST). Our observations lasted 
7 hand we acquired 84 time-series spectra with the G102 grism, which 
covers the 8,000-11,000 A wavelength range. Further details about the 
observations and data reduction can be found in Methods. 

Each spectrum was integrated along the wavelength axis to first 
produce a ‘white’ light curve (Extended Data Fig. 1). In addition to 
the planetary transit signal, the resulting time series was affected by 
instrumental systematic errors caused by electron trapping in the 
WEC3 detector”. We fitted the white-light curve with a planetary transit 
model‘ multiplied by a linear baseline trend and a physically motivated 
WEC3 systematics model®. For the planetary transit model, we allowed 
the planet-to-star radius ratio (R,/R;) and the mid-transit time (fo) to 
vary as a free parameter, while holding the ratio of the orbital distance 
to the stellar radius (a/R,), the inclination (i), the eccentricity (e) and 
the period (P) fixed to previously determined values®’. We assumed a 
quadratic limb-darkening profile for the star, holding the coefficients 
fixed to values determined from a model stellar spectrum®. Further 
details about this fit are provided in Methods. The results of the fit are 
reported in Extended Data Table 1 and Extended Data Fig. 1. 

Two sets of spectroscopic light curves were constructed by summing 
each spectrum into broad- and narrowband bins. The first set consisted 
of 9 broadband channels spanning the 8,770-11,360 A wavelength 
range, and the second set comprised 20 overlapping, narrowband chan- 
nels spanning the 10,580-11,070 A wavelength range. The narrowband 
channels covered the helium absorption triplet at 10,833 A (vacuum 


wavelength; the air wavelength of this line is 10,830 A). The widths 
of the broadband and narrowband channels were 294 A (12-pixel 
columns) and 98 A (4-pixel columns), respectively. We fitted both sets 
of spectroscopic light curves using the approach described above for 
the white-light curve. However, for the planetary transit signals, we only 
allowed R,/Rs to vary as a free parameter, while holding fo, a/R,, i, e and 
P fixed to the values reported in Extended Data Table 1. We fixed the 
limb-darkening coefficients in a similar way to the white-light curve 
fit. Additional details of the fitting procedure are given in Methods. 
The inferred values for the transit depth, (R,/R,)?, in each wavelength 
channel are shown in Fig. 1 and Extended Data Table 2. These results 
constitute the atmospheric transmission spectrum. 

The broadband transmission spectrum is consistent with a previous 
transmission spectrum for WASP-107b, obtained using the WFC3 G141 
grism, which covers the 11,000-16,000 A wavelength range’. The latter 
exhibits a muted water absorption band centred at 14,000 A, with an 
otherwise flat spectrum implying an opaque cloud deck. After applying 
a correction for stellar activity variations between the G102 and G141 
observation epochs (see Methods), the G102 spectrum aligns with the 
cloud deck level inferred from the G141 spectrum (Fig. 1). 
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Fig. 1 | Combined near-infrared transmission spectrum for WASP-107b 
with the helium absorption feature. a, Data are plotted on a linear scale. 
Points, with 1o error bars, are from a previous study (black)? and this work 
(light and dark blue), both corrected for stellar activity (see Methods). The 
solid purple line is the best fit obtained with a lower-atmosphere retrieval 
model based on the Markov chain Monte Carlo technique, and the pink- 
shaded area encompasses 99.7% of the Markov chain Monte Carlo samples. 
The gold line is the best-fitting absorption profile for the 10,830-A helium 
line, obtained by our 1D escaping-atmosphere model. b, Same as a, ona 
logarithmic scale. The dashed blue line shows the Roche radius. 
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Fig. 2 | Narrowband transmission spectrum of WASP-107b, centred 

on 10,833 A. Each spectroscopic channel has been shifted one pixel away 
from the next one. Non-overlapping bins are shown by blue symbols. All 
error bars are 1a. The peak of the spectrum coincides with the 2°S helium 
absorption line at 10,833 A. 


The helium triplet has an expected width of approximately 3 A, 
whereas the resolution of the G102 grism is 67 A (about 3 pixels)!° 
at 10,400 A. Therefore, to make a finely sampled transmission 
spectrum, we shifted each of the 20 narrowband channels by one pixel 
with respect to the adjacent channel along the wavelength axis. The 
narrowband transmission spectrum peaked at the channel most closely 
centred at 10,833 A (Fig. 2), as would be expected if helium absorption 
in the planetary atmosphere were responsible for the signal. To estimate 
the amplitude of the absorption feature, we focused on five non- 
overlapping channels centred on 10,833 A. All but one of these channels 
were consistent with a baseline transit-depth level of 2.056% + 0.005%. 
The single exception was the channel centred on the 10,833-A helium 
triplet, where the transit depth was visibly larger than for the surrounding 
channels (Fig. 3), and we obtained (Rp/Rs)? = 2.105% +0.010%. We 
ruled out various alternative explanations for the signal, including 
other absorbing species, helium in Earth’s atmosphere, and occultation 
of inhomogeneities in the stellar chromosphere and photosphere 
(see Methods). 

The metastable helium probed by 10,833-A line absorption forms 
high up in planetary atmospheres, at microbar- to nanobar-level pres- 
sures, where stellar extreme-ultraviolet radiation is absorbed!!. On the 
other hand, absorption of the neighbouring continuum occurs deeper in 
planetary atmospheres, at millibar- to bar-level pressures. Therefore, to 
interpret the broadband (continuum) and narrowband (about 10,833 A) 
transmission spectra, we used separate lower- and upper-atmosphere 
models. For the combined G102 and G141 broadband spectrum (with 
the 10,775-10,873A range removed), we performed an atmospheric 
retrieval analysis using our one-dimensional (1D) radiative transfer 
code, ATMO!*!3 (see Methods and Extended Data Table 3). We found 
that the broadband data were well explained by a grey absorbing 
cloud deck across the full 8,780-11,370 A wavelength range, in 
addition to H2O absorption. We obtained a volume mixing ratio of 
5 x 1073-4 x 107? for H,O, consistent with previous estimations”. 

We investigated the narrowband transmission spectrum using 
two numerical models for the upper atmosphere of WASP-107b 
(see Methods). The first, a 1D model!4, solves for the level popula- 
tions of a H/He Parker wind, and suggests that WASP-107b is losing 
its atmosphere at a rate of 10!°-3 x 10! g s~!, corresponding to about 
0.1%-4% of its total mass every billion years. The second, a three- 
dimensional (3D) model!®”7 obtains an escape rate of 10°-107 g s-/ for 
metastable helium (for comparison, the 1D model gives an escape rate 
of about 10° gs~! for 27S helium). It also suggests that stellar radiation 
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Fig. 3 | Transit light curves for three 98-A-wide spectroscopic channels. 
a, Dark-blue points are from the channel centred on the He 1 10,833-A 
line; gold and light-blue points are from the two adjacent channels. All 
data points have 1o error bars. Solid lines are best-fit light-curve model 
results. The transit depth of the dark-blue curve is visibly the largest. 

b, Binned difference between the light curves of the 10,775-A and 
10,873-A channels, and the average of the two adjacent channels (blue 
points, 1o errors), highlighting the excess absorption. This difference is 
well explained by both our 1D (green line) and 3D (red line) escaping- 
atmosphere models. Green and red points are binned model results. 


pressure blows the escaping helium atoms away from the planet so 
swiftly that they form a tail nearly aligned with the star—planet axis; 
this could explain the lack of post-transit occultation detected in our 
data (Fig. 3). The radiation pressure may also blue-shift the absorption 
signature over velocities of hundreds of kilometres per second, which 
would be observable at higher spectral resolution (Fig. 4). 

Atmospheric mass loss can substantially alter the bulk composition 
of a planet. For example, there is evidence that atmospheric escape is 
responsible for the observed dearth of highly irradiated super-Earth 
and sub-Neptune exoplanets with sizes between 1.6 and 2 Earth 
radii!”-?!. To evaluate planet formation theories and assess whether 
these planets have substantial H/He envelopes, it is necessary to 
understand how atmospheric mass loss affects the subsequent evolution 
of bodies that start with sizeable atmospheres. Empirical constraints, 
such as that presented here for WASP-107b, are therefore crucial for 
retracing evolutionary pathways and interpreting the present-day 
population of planets”’. 

Until now, extended atmospheres have been detected on three 
exoplanets by targeting the Lya line in the ultraviolet'>”?4 and on one 
exoplanet using the optical Ha line”. Our observations of WASP-107b 
using the 10,833-A line provide not only the first detection of helium 
on an exoplanet, but also the first detection of an extended exoplanet 
atmosphere at infrared wavelengths. Our results demonstrate the 
feasibility of a new method for studying extended atmospheres that is 
complementary to those using the two hydrogen lines. 

We note that ground observations targeting the 10,833-A helium 
triplet are possible with existing high-resolution infrared spectro- 
graphs. In the near future, high-signal-to-noise observations will also 
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Fig. 4 | Results of the two models for the upper atmosphere of WASP- 
107b. a, Best-fitting absorption profiles of the helium 10,833-A triplet line 
from the 1D (blue), and 3D (orange) models. Both models reproduce the 
measured excess absorption of 0.049% + 0.011% in a 98-A bin. Higher- 
resolution observations will resolve the profile shape and further constrain 
the velocity of the planetary wind. b, Radial number density profiles of 
different atmospheric species from the 1D model; blue-shaded regions are 
lo errors. c, Top-down view of the planetary system from the 3D model, 
showing a comet-like tail of 2*S helium shaped by stellar radiation pressure. 


be possible with the James Webb Space Telescope at a spectral resolu- 
tion of about 4A (approximately 110km~!). 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0067-5. 
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METHODS 

Observations and data reduction. We observed one transit of WASP-107b with 
WEC3 in spectroscopic mode using the G102 grism (programme GO-14916; 
principal investigator, J.J.S.), which covers the approximate wavelength range 
of 8,780-11,370 A. We used forward spatial scanning to spread the spectra over 
about 60 pixels in the cross-dispersion direction with the SPARS10, NSAMP = 15 
setup, giving exposure times of about 103 s. This allowed 17 exposures per HST 
orbit. The observations lasted for five HST orbits, with two orbits pre-transit, one 
during the transit and one post-transit, allowing us to constrain the out-of-transit 
baseline precisely. 

The raw frames were automatically reduced with the CalWF3 pipeline (version 

3.3)° of STScI. From the reduced frames we extracted the 1D spectra following 
standard methods’: we built up flux counts by summing the difference between 
successive non-destructive reads. We then removed the background from each read 
difference by subtracting the median of a box of pixels uncontaminated by the spec- 
trum. We found the flux-weighted centre of each scan and set to zero all pixels that 
were more than 75 rows away from the centre in the cross-dispersion axis, which 
removed many cosmic rays. The remaining cosmic rays were flagged by finding 
4c outliers relative to the median along the dispersion direction. We replaced each 
flagged pixel with the median along the dispersion direction, which was rescaled 
to the count rate of the cross-dispersion column. Because the scans were visibly 
tilted from the dispersion axis, we used the package Apall of the Image Reduction 
and Analysis Facility to fit the trace of the two-dimensional scans and extract 1D 
spectra. We found the wavelength solutions by cross-correlating the extracted spec- 
tra with the ATLAS model stellar spectrum* that most closely matches WASP-107 
(effective temperature, Tajp= 4,500 K; surface gravity, log(g) = 4.5 in CGS units), 
modulated by the G102 grism throughput. Following standard methods? we inter- 
polated each spectrum onto the wavelength range of the first exposure to account 
for shifts in the dispersion axis over time. 
White-light curve analysis. We extracted the white-light curve by summing the 
total counts of each 1D spectrum. To constrain the mid-time of the transit, we 
fitted the resulting time series with the BATMAN transit model’, multiplied by a 
linear baseline trend and a physically motivated systematics model. For the latter, 
we employed the RECTE model’, which accounts for two populations of charge 
traps in individual pixels of the detector and successfully replicates the ramp-like 
features that dominate the systematics. The RECTE model allows us to keep the 
first-orbit observations in our fit. The free parameters of our final model were: 
the planet-to-star radius ratio, R,/R,; the mid-transit time, fp; the gradient (c,) and 
y-intercept (co) of the linear background trend; four parameters for the charge- 
trapping model—the initial numbers of populated slow (spop) and fast (fpop) traps, 
and the corresponding changes in the two populations (6, and 5;) between each 
orbit; and an uncertainty rescaling factor, 3, for the expected photon noise. We 
fixed a/R,, i, e, and the period using estimates from Kepler light curves’. To model 
the stellar limb darkening, we fitted a four-parameter nonlinear limb-darkening 
law?’ to the ATLAS stellar model described above. 

Because the shape of the ramp-like systematics depends on the count level of 
the illuminated pixels, the RECTE model requires the ‘intrinsic’ count rate of a 
pixel (that is, the actual flux received from the star) to model the charge trap- 
ping. To create a template of the intrinsic count rate, we median-combined four 
raw images from the end of the second orbit, in which the charge traps appear 
completely filled and the ramp shape of the light curve has tapered to a flat line. 
Although it is possible to model each illuminated pixel, this is computationally 
expensive for a large scan. Additionally, the ramp profile is blurred by systematic 
errors introduced by telescope jittering and pointing drift. Instead, we divided the 
scan into columns with a width of 10 pixels along the dispersion axis and fed the 
median-count profiles into the model. 

We used the Markov chain Monte Carlo (MCMC) package emcee”’ to mar- 
ginalize over the parameter space of the model likelihood distribution. We used 
80 walkers and ran chains for 8,000 steps, discarding the first 800 as burn-in, 
before combining the walker chains into a single chain. The best-fitting model 
and residuals are shown in Extended Data Fig. 1, with the parameter values and 
lo uncertainties reported in Extended Data Table 1. Although WASP-107b orbits 
an active star, we see no evidence of star spot crossings. For context, only five 
spot-crossing events have been reported in 10 Kepler transits”°°. 

Broadband spectroscopic light-curve fit. We binned each spectrum into nine 
spectroscopic channels across the 8,780-11,370 A wavelength range, each span- 
ning 10-12 detector pixels. The resulting light curves are shown in Extended Data 
Fig. 1. Because the throughput of the G102 grism is wavelength-dependent, the 
shape of the charge-trapping ramp in each spectroscopic light curve is different. 
Therefore, we simultaneously fitted each channel using a transit model multiplied 
by a linear baseline trend and a charge-trap model. To make a template of the 
intrinsic counts, we took the median cross-dispersion-direction profile of each 
channel in the same four raw images as used in the white-light curve fit. We fixed 
ty to the value found from the white-light curve fit. Similarly to the white-light 
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curve fit, we fixed the orbital parameters to those derived from Kepler light curves” 
and wavelength-dependent limb-darkening coefficients from the ATLAS model. 
Therefore, for each channel the fitted parameters were Rp/Rs, C1, Cos Spop» pop» Os» Ob 
and (3. We ran MCMC fits for each light curve using emcee with 80 walkers, 80,000 
steps and a burn-in of 800. 

As a test, we ran additional fits for the spectroscopic light curves with the stel- 
lar limb-darkening coefficients as free parameters. This produced results that 
were consistent, within 1c, with those obtained from the analysis in which the 
limb-darkening coefficients were fixed. 

We show the resulting spectroscopic light curves divided by their best-fitting 

systematics models in Extended Data Fig. 1, along with their residuals. Extended 
Data Table 2 reports our median values for the transit depth, (R,/R,), with lo 
uncertainties calculated from the MCMC chains. We also list the root mean square 
of the residuals for each channel, which range between 1.038-1.198 times the 
photon noise. 
Narrowband spectroscopic light-curve fit around 10,830 A. To target the 
10,833-A helium triplet, we binned the spectra from 10,590 to 11,150 A into 20 
narrowband channels. Each channel spanned four detector pixels, as a compromise 
between the low instrument resolution, the signal-to-noise ratio and the narrow- 
ness of the targeted feature. The wavelength coverage of each channel was shifted 
relative to the adjacent channel by one pixel, so the channels overlap. 

We note that because the reported resolution” of the G102 grism is WA\~ 155 
at 10,400 A (which corresponds to A +67 A, or 2.7 pixel widths), the smallest bins 
that are theoretically possible are three pixels wide. A resolution of three pixels 
could be achieved if the 10,833-A feature lay in the centre of a pixel; however, in 
our data it lies considerably blue-wards of the centre of its pixel. This means there is 
some 10,833-A flux in the pixel located two pixels blue-wards of the 10,833 Aline. 
Indeed, when we tested the three-pixel case, we found that the amplitude of the 
10,833-A feature increased by 0.011% compared with the four-pixel-bin fit, which 
is similar to the expected increase of 0.016% if all the 10,833-A flux fell within a 
central three-pixel bin. With three-pixel bins the feature also appeared to have a 
slight blue ‘wing; which is unlikely to have an astrophysical origin because such 
wings would be expected from binning the data to a resolution higher than that of 
the spectrograph. We therefore used conservative four-pixel bins. 

Extended Data Fig. 2 shows the spectroscopic light curves divided by their 
best-fitting systematics models, along with their residuals. Extended Data Table 2 
shows our median values for the transit depth and their 1o uncertainties, calculated 
from the MCMC chains. We also list the root mean square of the residuals for each 
channel, which range from 0.976 to 1.22 relative to the photon noise. The resulting 
transmission spectrum is shown in Fig. 2. 

Previous studies*! have highlighted the importance of considering the effect 
of stellar limb darkening in stellar absorption lines on exoplanet transmission 
spectra. To investigate whether this could cause the strong feature at 10,833 A, we 
re-ran the narrowband spectroscopic light-curve fits while fitting for a quadratic 
limb-darkening law. The resulting spectrum was consistent with our previous 
analysis within lo. 

Strong stellar lines that shift over the edges of pixels can introduce noise to meas- 

ured transmission spectra**. We checked this effect by smoothing our extracted 
time-series spectra with a Gaussian kernel with a full-width at half-maximum of 
four pixels, and re-running the narrowband spectroscopic light-curve fits. Our 
measured 10,833-A absorption feature remained consistent within lo. 
MEarth observations. Photometric monitoring observations were performed 
using a single telescope of the MEarth-South*** array (CS 2015) at the Cerro 
Tololo Inter-American Observatory in Chile. Data were obtained on 78 nights from 
22 March 2017 (universal time) to 1 August 2017 in groups of four 15-s exposures, 
with these exposure groups repeated at a cadence of approximately 30 min. A total 
of 3,096 exposures were collected over this period. The bandpass of these observa- 
tions is in the red optical region, with the blue cut-off defined by RG715 glass to be 
approximately 7,150 A and the red cut-off defined by the decline of the quantum 
efficiency of the charge-coupled device (CCD) to be approximately 10,000 A. For 
our data reduction, we used our previously published methodology*, modified 
for the specifics of the MEarth data*®. The CCD camera shutter failed on 9 May 
2017, which required removal for servicing. 

Because this procedure introduces flat-fielding errors that are not corrected to 
sufficient precision by standard calibrations, we allow for these errors explicitly 
in the analysis by solving for a change in the magnitude-zero points on both sides 
of the meridian at this date by following standard methods”. The result of this 
analysis is a ‘least-squares periodogram (shown in Extended Data Fig. 3), obtained 
by simultaneously fitting a periodic modulation, while accounting for the four 
magnitude-zero points and two additional linear terms describing sources of sys- 
tematic errors in the photometry (the full-width at half-maximum of the stellar 
images, and the ‘common mode’ as a proxy for the effect of variable precipitable 
water vapour on the photometry). This procedure would be mathematically equiv- 
alent to a Lomb-Scargle periodogram in the absence of these six extra terms. The 
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highest peak in the periodogram and its full-width at half-maximum correspond to 
a periodicity of 19.7 + 0.9 days. This is consistent with estimates from Kepler light 
curves of 17.5 + 1.4 days*°. We find an amplitude of about 0.00150 mag. 
Automated imaging telescope photometry. We acquired nightly photometric 
observations of WASP- 107 with the Tennessee State University Celestron 14-inch 
(C14) automated imaging telescope (AIT), which is located at the Fairborn 
Observatory in southern Arizona**“. The observations were made in the Cousins 
R passband with an SBIG STL-1001E CCD camera. Differential magnitudes of 
WASP-107 were computed with respect to eight of the most constant comparison 
stars in the CCD field. Details of our data acquisition, reduction and analysis can 
be found in a previous publication*’, which describes a similar analysis of the 
planetary host star WASP-31. 

A total of 120 nightly observations (excluding a few observations in transit) 
were collected between 23 February and 28 June 2017. The nightly differential 
magnitudes are plotted in Extended Data Fig. 4a. Extended Data Fig. 4b and c 
shows the frequency spectrum of the observations and the phase curve computed 
with the best frequency. Our frequency analysis is based on least-squares sine fits 
with trial frequencies between 0.01 and 0.5 cycles per day, corresponding to periods 
between 2 and 100 days. The goodness of fit at each frequency is determined as 
the reduction factor in the variance of the original data. Low-amplitude brightness 
variability is seen at a period of 8.675 + 0.043 days with a peak-to-peak amplitude 
of only 0.005 mag. This period is almost exactly half of the 17.5-day rotation period 
found from Kepler light curves*” and demonstrates that WASP-107 has spots or 
spot groups on opposite hemispheres of the star during the epoch of our observa- 
tions. The WASP-107b discovery team’ also found periods of around 17 and 8.3 
days in their 2009 and 2010 photometry. 

Stellar variability correction. To correct for stellar variability between the G141 
and G102 epochs, we follow a method similar to that used in previous studies*!”, 
and estimate the flux from the non-spotted stellar surface as F, = max(F) + ko, 
where F is the photometric light curve, k is a fitted value and a is the standard 
deviation of the light curve. We adopt the value k= 1, which was determined in a 
previous study“'. We use the best-fitting period, amplitude and ephemeris from 
the MEarth photometry to estimate the expected flux-dimming correction at the 
mid-transit times for both datasets. We use the wavelength-dependent spot cor- 
rection factor developed in a previous work* to correct for unocculted spots, and 
we set the spot temperature to be 3,200 K. After the correction, the two spectra 
align well and appear to share a flat baseline. The one overlapping spectral channel 
between G102 and G141 is consistent within lo. 

ATMO retrieval. For the combined G102 and G141 broadband spectrum, cor- 
rected for photospheric variability, we performed an atmospheric retrieval analysis 
using our 1D radiative transfer code, ATMO!#344-46 We assumed an isothermal 
temperature-pressure profile and used MCMC to fit for the following parameters: 
atmospheric temperature, planetary radius at a pressure of 1 mbar, grey cloud 
opacity, and the abundances of HO, CO, CO, CH4, NH3, H2S, HCN and C,H». We 
assumed solar abundances under chemical equilibrium for other gas species. We 
note that for this analysis we excluded wavelengths coinciding with the narrowband 
channel centred on the 10,833-A helium triplet. Our best-fitting model is shown 
in Fig. 1, with y?=31.4 for 18 degrees of freedom. 

Assessing detector defects and random noise. We confirmed that the residuals 
for the pixel columns in each frame did not reveal any obvious anomalies over 
the narrow 10,833-A helium triplet, which suggests that this feature is not caused 
by detector defects or uncorrected cosmic rays. In addition, the transit depths 
remained consistent within 0.50 when we removed 1/3 of the points in the light 
curves, in several random sub-sets, and refitted them with the same procedures 
as described above. 

Absorption from other species. The strong absorption line of metastable 2°S 
helium at 10,833 A aligns extremely well with the peak of the feature. In the 20-A 
region surrounding this peak (10,820 to 10,840 A), helium is the only species 
that contains absorption solely within this wavelength range and nowhere else 
within the G102 bandpass (8,060 to 11,170 A). There is, for example, a strong 
silicon absorption line at 10,830 A anda water line at 10,835 A (vacuum wave- 
lengths)*’; however, if either species were the cause of the absorption seen in 
our transmission spectrum, there would be other similarly strong silicon lines 
measured at 10,588, 10,606 and 10,872 A, and a water line at 10,929 A, where 
we see no excess absorption. The other atoms with strong absorption lines near 
10,833 A are Np; Cs, Fe, Th, S, Cr, V, Yb, and Cu—all of which can be ruled out 
because they are either radioactive with short half-lives or have other strong 
transitions in the 8,060-11,170A wavelength range, which we do not observe. 
We have also found no species in the ExoMol**® or HITRAN/HITEMP*° 
databases with sufficiently sharp features aligned at 10,833 A. Specifically, we 
searched the following species: CH4, CO2, HCN, NH, CH, OH, PO, NO, VO, 
TiO, CN, C2, PH3, NH3, SiO, CaO, H3*, CO, H2CO, C2H2, BeH, LiH, HCl, 
AI1O, SO2, H2S, PN, KCl, NaCl, CS, CP, PS, MgH, NaH, CrH, CaH, FeH and 
ScH. We therefore conclude that absorption by metastable helium at 10,833 A 


is the most plausible explanation for the signal detected in the narrowband 
transmission spectrum. 

Assessing Earth's exosphere. Where Earth's exosphere is illuminated by extreme 
ultraviolet radiation from the Sun, there is metastable helium. At an altitude of 
about 500 km, HST passes right through Earth’s exosphere and, when it is not 
in Earth’s shadow, it passes through regions containing metastable helium. The 
change in abundance of the metastable state throughout HST’s orbit has been 
shown to impart a time-varying background signal in the 10,833-A line on the 
timescale of one spacecraft orbit! of about 95 min. There is no telluric metastable 
helium in Earth's shadow and, as expected, there is no substantial excess absorp- 
tion at 10,833 A while HST is in Earth’s shadow”. It does, however, affect HST 
measurements at dawn and dusk— that is, when the spacecraft passes through the 
Sun-illuminated upper atmosphere. The magnitude of the effect is correlated with 
the solar activity cycle—that is, more activity means more ultraviolet radiation and 
more metastable helium. The effect of spatially diffuse telluric helium emission on 
WEC3 slit-less spectroscopy is to impart an increased sky background signal across 
the detector. At the time of the observations, we were approaching solar minimum, 
and the 10.7-cm radiation (which is a proxy for solar activity) was only 70 solar 
flux units (sfu), according to the Solar Monitoring Program of Natural Resources 
Canada (http://www.spaceweather.gc.ca/solarflux/sx-en.php). According to the 
WEC3 instrument report?!, observations only appear considerably affected when 
the 10.7-cm flux is greater than about 100 sfu. 

Nonetheless, to test whether metastable helium at dawn and dusk in Earth's 

atmosphere could cause an anomalous absorption feature in our transmission 
spectrum, we removed the first and last four exposures of each orbit—which 
encompass the initial and final 10 min—when HST passed through the illumi- 
nated dusk and dawn exosphere, and refitted the light curves. The results were 
consistent with previous analysis at less than 1a, which indicates that emission 
from telluric helium is not the cause of the narrowband absorption feature in our 
data. We note that previous transit spectroscopic studies using G102™°? did not 
show excess absorption at 10,833 A. 
Assessing the stellar chromosphere. We also considered the possibility that the 
absorption feature that we measured at 10,833 A could bea result of stellar activity, 
as the metastable 2°S state of helium is formed in the inhomogeneous upper chro- 
mospheres and coronae of stars via photo-ionization, recombination and collisional 
excitation. The passage of the planet over quiet regions with less 10,833-A helium 
absorption could, in theory, increase the relative transit depth at this wavelength 
and thus mimic an exoplanet atmospheric feature. 

Theoretical models of chromospheres™° predict the maximum equivalent 
width of the 10,833-A helium line in the spectra of F- to early K-type stars to be 
about 0.4 A. Being a K6 star, WASP-107 lies just outside the valid range of spectral 
types for this model. However, in the following we show that to match our observed 
transmission spectral feature, the nominal chromospheric absorption of the WASP- 
107 host star at 10,833 A would need to be five times stronger than any isolated 
(that is, non-multiple), main-sequence dwarf star measured until now. 

After searching the literature for all 10,833-A helium triplet equivalent-width 
measurements of isolated dwarf stars, we found more than 300 measurements of 
over 100 distinct stars, including 23 measurements of 11 different stars of similar 
spectral type to WASP-107 (K5-K7). We found no measurements greater than 
0.409 A®*-®!, We took an additional measurement of the K6 star GJ380 with the 
Near-Infrared Echelle Spectrograph (NIRSpec) on Keck and found an equivalent 
width of 0.311 A (A. Dupree, private communication). 

Furthermore, it has been shown” that the equivalent width of the 10,833-A 
line is related to that of another neutral helium absorption line, at 5,876 A. The 
5,876-A line is produced by the transition from the 23D to the 23P state. As such, 
the 5,876-A line forms in the same regions of the stellar chromosphere as the 
10,833-A triplet (which corresponds to the 23S—2°D transition). Extended Data 
Fig. 5 shows the equivalent-width measurements of the 10,833- Aand 5,876-A lines 
in a survey of 31 FGK stars. A strong correlation is apparent. 

To investigate the 5,876-A helium line of WASP-107, we co-added high- 
resolution spectra obtained with the High Accuracy Radial velocity Planet 
Searcher (HARPS) spectrograph (European Southern Observatory programme 
093.C-0474(A)). These spectra cover a wavelength range of 3,800-6,900 A 
(Extended Data Fig. 5). We fitted the co-added spectrum for the equivalent width 
of the 5,876-A helium line, with the result indicated in Extended Data Fig. 5 
as a yellow-shaded region. We find that the equivalent width of this feature is 
similar to that measured for other single dwarf stars, with no evidence of unusual 
activity. Given the well established correlation between the equivalent widths 
of the 5,876-A and 10,833-A helium lines noted above, this provides further 
evidence against the WASP-107 host star having an abnormally deep 10,833-A 
line. In addition, we measured the Ca 11 H- and K-line emission S-index (Syx) 
for WASP-107 from the HARPS spectra and found a night-averaged value of 
Sux = 1.26 + 0.03 (A.W,, private communication), which is a moderate value 
for a K6 star®!. 
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We therefore adopt the maximum equivalent width of 0.4 A to estimate an upper 
limit for the amplitude of a feature that could be caused by unocculted 10,833-A 
helium absorption of stellar origin in our 98-A-wide spectroscopic channel. We 
consider the limiting case in which WASP-107b occults only quiet regions of the 
star, where we assume there is no 10,833-A line absorption. This is the scenario 
in which the maximum amount of stellar-continuum flux at 10,833 A would be 
blocked out by the planet, which we treat as a fully opaque disk. We estimate the 
increased transit depth to be 


A 
pl 
Dayctivity = [Wie = 2.064% + 0.005% 
Woin 


where Ap = 2.056% + 0.005% is the fraction of the stellar area occulted by the 
planet, Wye= 0.4 A, is the maximum equivalent width of the stellar absorption 
feature, and Wyin is the width of the spectral bin (98 A). This gives an upper limit 
of the feature caused by stellar activity, &Dactivity = Dactivity — Api = 0.008% + 0.005%, 
which is less than one-fifth of the measured size of the feature (0.049% + 0.011%). 
We therefore conclude that the observed absorption feature cannot be caused by 
stellar chromospheric spatial inhomogeneity alone. 

Resolution-linked bias. If an absorption line overlaps in both a stellar and a plane- 
tary atmosphere spectrum, and the line is unresolved in the measured transmission 
spectrum, then the planetary absorption can be underestimated. The effect is called 
resolution-linked bias (RLB)®. For the 10,833-A line in the WASP-107 system 
this dilution effect competes with the possible overestimation of the signal from 
unocculted chromospherically active regions (as described in the Methods section 
‘Assessing the stellar chromosphere’). The magnitudes of both effects depend on 
whether the planet transits in front of active or quiet regions of the star. The RLB 
effect would be largest if the planet transited only chromospherically active regions 
(which have the highest 10,833-A line absorption). We estimated the magnitude of 
the RLB effect in this limiting case following the method described in a previous 
publication® and assuming an equivalent width of 0.4A for the 10,833-A stellar 
line. For a measured absorption excess of 0.049% + 0.011% in a 98-A bin centred 
on the 10,833-A line, we could be underestimating the planetary absorption by 
up to 0.009% (that is, about one-fifth of the measured signal). However, without 
knowing which part of the chromosphere the planet transits, the stellar line profile 
and the velocity structure of the planetary helium signature, we cannot accurately 
estimate the magnitudes of the competing effects. 

Stellar flares. Because the 10,833-A He line appears in the emission of solar (and 
presumably stellar) flares™, active stars like WASP-107 could show short-term 
variability in this line, which may be difficult to disentangle from a transiting plane- 
tary signal*!. Flares are unlikely to mimic the signal that we detect perfectly, because 
the planet would need to pass in front of flaring regions of the star throughout 
the duration of the transit. Instead, unocculted flares could dilute atmospheric 
absorption of the 10,833-A He line. Visual inspection of the raw light curve of 
the spectroscopic channel centred on 10,833 A shows no evidence of flare events. 
Additionally, the pre- and post-transit flux levels agree with each other, which 
would not be the case if there was substantial 10,833-A emission from the tail of 
a flare. As a precaution, we re-produced the narrowband transmission spectrum 
around the 10,833-A line using different combinations of the out-of-transit base- 
line: first, with only orbits 2 and 4; then, with orbits 1 and 3; and then with orbits 
2 and 5. All three cases gave a 10,833-A absorption feature that were consistent 
within 1o of our full fit. 

Photospheric spots and faculae. To quantify the effect of a heterogeneous 
photosphere on the transmission spectrum around 10,833 A, we used a variability 
modelling method®® that uses an ensemble of model stellar photospheres with 
randomly located active regions to estimate the fraction of the stellar surface cov- 
ered by photospheric spots and faculae for a given rotational variability amplitude. 
While variability monitoring traces only the non-axisymmetric component of the 
stellar heterogeneity, and thus provides a lower limit on active-region-covering 
fractions®, this numerical approach provides a more complete understanding of 
the range of covering fractions that may correspond to an observed variability 
level. The model describes the integrated full-disk spectrum by the combination of 
three components: the immaculate photosphere, spots and faculae. We used three 
spectra interpolated from the PHOENIX model grid” with log(g) = 4.5, metallicity 
[M/H] = +0.02 and different temperatures to represent the three components. 
Following previous works®, we set the photosphere temperature, Tphot as the effec- 
tive temperature of the star (Ter= 4,430 K)° and adopted scaling relations for the 
spot temperature Tspo?™ and faculae temperature Tfy,””. 

Thus, the temperatures of the three components are Tphot = Tesp = 4,430 K, 
Tspot = 0.73 X Tphot = 3,230 K, and Tiac= Tpnot + 100 K= 4,530 K. The paper® 
describing the discovery of WASP-107b reports a 17-day periodic modulation with 
a 0.4% semi-amplitude (0.8% full amplitude) for WASP-107. Assuming a typical 
spot radius of rspo.= 2°, we find that the reported rotational variability could be 
caused by a spot filling fraction of f ao (4*3)% (Lo confidence interval) if the 
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variability is due to spots alone. In the more realistic case, in which spots and 
faculae are both contributing to the variability, we find Forot = (8t$)% and 
Tee = (53713 )% The covering fractions that we report are means over the entire 
model photosphere; they do not take into account relative over- or under-abun- 
dances of magnetic features on the Earth-facing hemisphere during a transit. 
Therefore, in the worst-case scenario, they could underestimate the hemispheric 
covering fractions by a factor of 2. However, the 1o confidence intervals, which 
are derived from 100 model realizations with randomly selected active region loca- 
tions, are deliberately conservative to account for this. Extended Data Fig. 6 shows 
how unocculted photospheric stellar heterogeneities could affect the transmission 
spectrum, assuming that the planet transits a chord of immaculate photosphere. 
The stellar contamination factor, ¢, on the y axis is multiplied by the true (R,/Rs)” 
transit depth to produce the observed transmission spectrum; that is, ¢ > 1 means 
that the observed transit depth is larger than expected from the planetary atmos- 
phere model. The spots-plus-faculae model does not predict an increase in transit 
depth at 10,833 A. No sharp features are apparent around 10,833 A. Instead, the 
model predicts that transit depths should be inflated by about 1% across the full 
wavelength range of G102, with perhaps some features apparent at about 8,500 A 
and 8,900 A (for this reason, we only use the 8,780-1 1,370A region in our full 
transmission spectrum, even though the G102 throughput extends down to 
8,000 A). The strong absorption feature that we measure is therefore unlikely to be 
caused by photospheric inhomogeneity. 

1D escaping-atmosphere model. Here we give a brief overview of the first model 
that we used to investigate the narrowband transmission spectrum at 10,833 A, 
which is described in more detail in a previous publication!. This 1D model is 
based on the assumption that a thermosphere of a close-in exoplanet can be well 
represented by the density and velocity profiles of an isothermal Parker wind driven 
by gas pressure’!. We assume a composition of atomic hydrogen (90% by number) 
and helium (10%). We find the solution for the hydrogen ionization balance and 
the distribution of helium atoms in the ground, excited 23S and ionized states. The 
physical processes taken into account in the helium balance are photoionization 
from the ground and 23S states, recombination to the singlet and triplet states, 
collisional transitions between the triplet 2°S state and states in the helium singlet 
ladder (which includes collisions with both free electrons and neutral hydrogen 
atoms) and the radiative decay from the 23S state to the ground state. The photoion- 
ization rates are calculated using the ultraviolet stellar flux of a K6 star, HD 85512, 
taken from the MUSCLES survey”? (version 2.1774), placed at the orbital distance 
of WASP-107b. The equations used to compute the hydrogen and helium distri- 
butions, along with all the relevant reaction rate coefficients and cross-sections, 
are described in a previous paper!. We changed only the input parameters—such 
as the mass and radius of the planet and its host star, as well as the input stellar 
spectrum—-so that they match the properties of WASP-107b. 

On the basis of the obtained density profile of helium in the 2S state, we cal- 
culate the optical depth and the in-transit absorption signal at 10,833 A, assuming 
that a planet with a spherically symmetric thermosphere transits across the centre 
of the stellar disk. For a planet of given mass and radius, the wind temperature and 
the total mass-loss rate are free parameters in the model. Using results from the 
literature”>"’®, we explore the temperature range 5,000-13,000 K. To produce an 
absorption signal consistent with our measurement, the required mass-loss rate is 
between 10!° gs"! and3 x 10" gs-l. 
3D escaping-atmosphere model. Our second model has previously been used 
to interpret the escaping exosphere of the Neptune-mass exoplanet GJ436b!®””, 
The model considers neutral helium atoms that are released from the top of the 
thermosphere and subjected to planetary and stellar gravity, radiation pressure and 
photoionization. We found that the data are well explained by 27S helium atoms 
escaping at a rate of 10°-10’ gs '. The stellar radiation pressure on the escaping 
helium atoms is stronger than the counter-balancing stellar gravity by a factor of 
approximately 10 and 50 for the weakest and strongest of the 10,833-A triplet lines, 
respectively. Thus, the gas is blown away from the exoplanet so swiftly as to form 
a tail nearly aligned with the star—planet axis. 

Code availability. The custom code used to extract the HST spectra from the raw 
data frames is available upon request. The HST light-curve fitting was performed 
using the open-source BATMAN (https://github.com/lkreidberg/batman) and 
emcee codes (http://github.com/dfm/emcee), and the proprietary RECTE code. 
The ATMO code used to compute the lower-atmosphere models is currently 
proprietary, as are the 1D and 3D upper-atmosphere codes. 

Data availability. Raw HST data frames are publicly available online at the 
Mikulski Archive for Space Telescopes (MAST; https://archive.stsci.edu). 
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Extended Data Fig. 1 | G102 white-light curve and broadband 
spectroscopic light curves covering the wavelength range 0.88-1.14,4m 
for WASP-107b. a, Relative flux of the white-light curve with respect 

to systematics model results (blue points), with the best-fitting transit 
light curve plotted in black. b, White-light residuals and 1o errors, after 
removing the combined transit and systematics components of the 
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best-fitting model. c, Points are spectroscopic light curves divided by 
systematics model results, and black curves are best-fitting transit models, 
with vertical offsets applied for clarity. d, Best-fitting spectroscopic model 
residuals, with vertical offsets applied for clarity. Differently coloured 
points in c and d are used to highlight separate channels. 
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best-fitting transit models. b, Best-fitting model residuals, with vertical 
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amplitude of about 0.15%. Solid red lines show the results of the best- 
fitting sinusoidal model. 


Extended Data Fig. 3 | Ground-based photometry for WASP-107 from 
MEarth. We performed a Lomb-Scargle periodogram search and found 
a best-fitting period of 19.7 + 0.9 days (dashed red line), with a relative 
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Photometric Phase 
Extended Data Fig. 4 | Ground-based photometry for WASP-107b from _ of the 2017 observations shows low-amplitude variability with a period (P) 


AIT. a, Nightly photometric observations of WASP-107, acquired with of 8.675 days (a frequency, f, of 0.115 cycles per day). c, The data phased 
Tennessee State University’s C14 AIT at the Fairborn Observatory during to the 8.675-day period have a peak-to-peak amplitude of only 0.005 mag. 
the 2017 observing season. The number of observations (Nobs) was 120. HJD, heliocentric Julian Date; UCT, coordinated universal time; c/d, cycles 
AR is the relative flux in the Cousins R band. b, The frequency spectrum per day; Tins best-fit ephemeris. 
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Extended Data Fig. 5 | Equivalent widths of helium 5,876-A and 
10,830-A lines. a, Measurements for 30 stars of different colour indices 
with 1o errors from a previous work®’. The two helium lines are expected 
to form in the same stellar-atmosphere regions and their equivalent 
widths are clearly correlated. Our 5,876-A line measurement for WASP- 
107 (colour index B-V > 0.7), obtained from HARPS spectra, is plotted 
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as a red line and the red-shaded region shows the lo error. b, Co-added 
spectra of WASP-107b around the 5,876-A line of metastable helium from 
the HARPS radial-velocity campaign (blue line). Absorption lines are 
fitted with Gaussian profiles, and the best-fit results are shown as green, 
yellow and red lines, with the sum of the profiles shown in black. The best- 
fitting line profile of the 5,876-A line is shaded in grey. 
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Extended Data Table 1 | Fitted parameters from the G102 white-light curve 
Parameter Value 
Rp/Rs 0.142988+0.00012 
to (BJ Dutc) 2,457,904. 7295+0.0002 
Co 1.00004-2e-5 
C1 -0.0018+0.0002 
Spop 62+17 
= 4246 
os -2+10 
of 654 
B 1.7340.15 
P 5.72147° 
i(°) 89.73 
ARs 18.164° 
e (assumed) 0 


Errors quoted were calculated using 68% of the MCMC samples after burn-in. BJD, barycentric Julian date; UTC, coordinated universal time. 


@Parameters fixed from Dai & Winn’. 
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Extended Data Table 2 | All results from transit light-curve fits 


Wavelength (A) Transit Error (%) RMS (PPM) | RMS/ phot. | Correction 
depth (%) factor 
8,769- 9,063 2.0451 0.0084 326 1.178 1.007101 
9,063 - 9,356 2.0425 0.0069 276 1.077 1.006785 
9,356 - 9,650 2.0514 0.0079 285 1.184 1.006549 
9,650 - 9,943 2.0514 0.0064 252 1.083 1.006454 
9,943 - 10,237 2.0456 0.0066 264 1.167 1.006340 
10,237 - 10,530 2.0448 0.0058 241 1.080 1.006303 
10,530 - 10,775 2.0431 0.0065 245 1.048 1.006162 
10,873 - 11,142 2.0461 0.007 269 1,152 1.006123 
11,142 - 11,386 2.0509 0.0069 298 1.198 1.005945 
10,579 - 10,677 2.0634 0.0091 344 0.989 1.00596 
10,604 - 10,701 2.0500 0.0088 381 1.102 1.005923 
10,628 - 10,726 2.0604 0.0089 366 1.061 1.006214 
10,652 - 10,750 2.0571 0.0075 336 0.976 1.006167 
10,677 - 10,775 2.0563 0.0082 360 1.043 1.006131 
10,701 - 10,799 2.0643 0.0103 395 1.143 1.006046 
10,726 - 10,824 2.0830 0.0094 354 1.023 1.005985 
10,750 - 10,848 2.0964 0.0102 415 1.198 1.005928 
10,775 - 10,873 2.1048 0.0097 391 1.126 1.005923 
10,799 - 10,897 2.0998 0.0084 387 1.117 1.005948 
10,824 - 10,922 2.0870 0.0091 390 1,128 1.005949 
10,848 - 10,946 2.0585 0.0095 409 1.183 1.006008 
10,873 - 10,970 2.0546 0.0104 385 1.111 1.005982 
10,897 - 10,995 2.0634 0.0108 423 1.220 1.005973 
10,922 - 11,019 2.0642 0.0098 377 1.087 1.005967 
10,946 - 11,044 2.0543 0.0093 363 1.046 1.005935 
10,970 - 11,068 2.0502 0.0101 375 1.084 1.005962 
10,995 - 11,093 2.0584 0.0103 373 1.082 1.005918 
1,019 - 11,117 2.0564 0.0098 385 1.117 1.005897 
11,04 - 11,142 2.0631 0.0105 414 1,197 1.005891 
Modified, previously 
published results! 
11,210 - 11,450 2.0723 0.0059 1.003979 
11,450 - 11,710 2.0814 0.0055 1.003919 
11,710 - 11,960 2.0585 0.0056 1.003918 
11,960 - 12,220 2.0577 0.0054 1.003848 
12,220 - 12,480 2.0535 0.0059 1.003892 
12,480 - 12,720 2.0572 0.0050 1.003897 
12,720 - 12,980 2.0699 0.0062 1.003830 
12,980 - 13,230 2.0818 0.0050 1.003805 
13,230 - 13,490 2.0742 0.0057 1.003983 
13,490 - 13,740 2.0943 0.0048 1.004081 
13,740 - 14,010 2.0878 0.0048 1.004059 
14,010 - 14,250 2.0974 0.0052 1.004110 
14,250 - 14,520 2.0907 0.0062 1.004126 
14,520 - 14,760 2.0777 0.0051 1.004136 
14,760 - 15,020 2.0767 0.0069 1.004107 
15,020 - 15,280 2.0762 0.0067 1.004020 
15,280 - 15,520 2.0593 0.0060 1.004116 
15,520 - 15,790 2.0562 0.0064 1.004007 
15,790 - 16,030 2.0581 0.0056 1.003941 
16,030 - 16,290 2.0595 0.0065 1.003969 
Modified results from a previous study? are included. RMS denotes the root mean square of the model residuals in parts per million (PPM). The second-to-last column is the RMS divided 
y the expected photon noise. The last column is the correction factor that we applied to account for stellar variability. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


Extended Data Table 3 | Results from ATMO retrieval code for the lower atmosphere 


Parameter Limits from 
MCMC 

Temperature (K) | 6507 25 

RRs at Imbar 0.914": -0.014 

VMR logio(H20) | -1.7 7.06 

VMR logo (CO2)_ | <10 

VMR logio (CO) =| <11 

VMR logo (CH4) | <10 

VMR logio (NH3) | <10 

VMR l0gio0 (H2S) <11 

VMR logio( HCN) | <11 

VMR 10g10(C2H2) <=10 


VMR stands for volume mixing ratio. Uncertainties for the temperature, Rp/Rs and the VMR of H20 were 
calculated using 68% of the MCMC samples after burn-in. Upper limits are from 1a MCMC errors. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


LETTER 


OPEN 


https://doi.org/10.1038/s41586-018-0017-2 


Characterization of the 1S- 


antihydrogen 


2S transition in 


M. Ahmadi!, B. X. R. Alves?, C. J. Baker?, W. Bertsche*®, A. Capra®, C. Carruth’, C. L. Cesar®, M. Charlton?, S. Cohen’, 

R. Collister®, S. Eriksson, A. Evans!, N. Evetts", J. Fajans’, T. Friesen’, M. C. Fujiwara®, D. R. Gill®, J. S. Hangst?*, W. N. Hardy", 
M. E. Hayden”, C. A. Isaac, M. A. Johnson*’, J. M. Jones’, S. A. Jones, S. Jonsell!?, A. Khramov’, P. Knapp®, L. Kurchaninov’, 
N. Madsen?, D. Maxwell’, J. T. K. McKenna’, S. Menary”, T. Momose", J. J. Munich’, K. Olchanski®, A. Olin®, P. Pusa!, 

C. @. Rasmussen’, F. Robicheaux'®, R. L. Sacramento®, M. Sameed*, E. Sarid!’, D. M. Silveira’, G. Stutter?, C. So!, T. D. Tharp!®, 


R. I. Thompson”, D. P. van der Werf? & J. S. Wurtele” 


In 1928, Dirac published an equation! that combined quantum 
mechanics and special relativity. Negative-energy solutions to 
this equation, rather than being unphysical as initially thought, 
represented a class of hitherto unobserved and unimagined 
particles—antimatter. The existence of particles of antimatter was 
confirmed with the discovery of the positron” (or anti-electron) by 
Anderson in 1932, but it is still unknown why matter, rather than 
antimatter, survived after the Big Bang. As a result, experimental 
studies of antimatter*’, including tests of fundamental symmetries 
such as charge-parity and charge-parity-time, and searches for 
evidence of primordial antimatter, such as antihelium nuclei, have 
high priority in contemporary physics research. The fundamental 
role of the hydrogen atom in the evolution of the Universe and in the 
historical development of our understanding of quantum physics 
makes its antimatter counterpart—the antihydrogen atom—of 
particular interest. Current standard-model physics requires that 
hydrogen and antihydrogen have the same energy levels and spectral 
lines. The laser-driven 1S-2S transition was recently observed? in 
antihydrogen. Here we characterize one of the hyperfine components 
of this transition using magnetically trapped atoms of antihydrogen 
and compare it to model calculations for hydrogen in our apparatus. 
We find that the shape of the spectral line agrees very well with that 
expected for hydrogen and that the resonance frequency agrees 
with that in hydrogen to about 5 kilohertz out of 2.5 x 10" hertz. 
This is consistent with charge-parity-time invariance at a relative 
precision of 2 x 10~'—two orders of magnitude more precise than 
the previous determination®—corresponding to an absolute energy 
sensitivity of 2 x 10~”° GeV. 

The transition of interest here, between the ground state and the first 
excited state of antihydrogen, has an energy of about 10.2 eV. The fre- 
quency of this transition in hydrogen has been measured’ to a few parts 
in 10°. We previously demonstrated’ the existence of the transition 
in antihydrogen, localizing the frequency to a few parts in 10'°. Here 
we characterize the spectral line shape of the transition to the limits of 
precision of our current apparatus. 

Matter and antimatter annihilate each other, so antihydrogen must be 
synthesized and then held in ultrahigh vacuum, in isolation from mat- 
ter, to be studied. The ALPHA-2 apparatus at CERN (Fig. 1) combines 
antiprotons from the antiproton decelerator? with positrons from a 
positron accumulator |! to produce and trap”? atoms of antihydrogen. 
Antihydrogen can be trapped in ALPHA-2’s magnetic multipole trap if 


it is produced with a kinetic energy of less than 0.54 K in temperature 
units. The techniques that we use to produce antihydrogen that is cold 
enough to trap are described elsewhere’*-“. In round numbers, a typi- 
cal trapping trial in ALPHA-2 involves mixing 90,000 antiprotons with 
3,000,000 positrons to produce 50,000 antihydrogen atoms, about 20 of 
which will be trapped. The anti-atoms are confined by the interaction 
of their magnetic moments with the inhomogeneous magnetic field. 
The cylindrical trapping volume for antihydrogen has a diameter of 
44.35 mm and a length of 280mm. 

The key to anti-atomic spectroscopy, as developed so far’ , is to 
illuminate a sample of trapped antihydrogen atoms with electromag- 
netic radiation (microwaves or laser photons) that causes atoms to be 
lost from the trap if the radiation is on resonance with the transition of 
interest. ALPHA-2’s silicon vertex detector!” (Fig. 1) affords us single- 
atom detection capability for the annihilation events associated with 
lost antihydrogen atoms or antiprotons that encounter the walls of the 
apparatus. The silicon vertex detector tracks the charged pions from 
the antiproton annihilation, and various reconstruction algorithms are 
used to determine the location (vertex) of each annihilation and to dis- 
tinguish antiprotons from cosmic-ray background using multivariate 
analysis’® (Methods). 

To excite the 1S—2S transition, we use a cryogenic, in vacuo enhance- 
ment cavity (Fig. 1) for continuous-wave light from a 243-nm laser 
system (Methods) to boost the intensity in the trapping volume. Long 
interaction times are possible, because the anti-atoms have a storage 
lifetime of at least 60h in the trap. Two counter-propagating photons 
can resonantly excite the ground-state atoms to the 2S state. Absorption 
of a third photon ionizes the atom, leading to loss of the antiproton 
from the trap. Atoms that decay from the 2S to the 1S state via coupling 
to the 2P state may also be lost, owing to a positron spin-flip””. 

Referring to the energy-level diagram of hydrogen in Fig. 2, there 
are two trappable, hyperfine substates of the 1S ground state (labelled 
‘c and ‘@’). In practice, we find that these states are, on average, equally 
populated in our trap: Nc = Na=WNj/2, where Nj is the number of 
ground-state atoms that are initially trapped in an experimental trial. 
The 2S state has corresponding hyperfine levels, and we refer to the 
transitions between the two manifolds as d-d (Fig. 2) and c-c (not 
pictured). 

For each experimental trial, we first accumulate antihydrogen atoms 
from three mixing cycles or ‘stacks’!? and then remove any leftover 
charged particles using pulsed electric fields. After a wait of about 10s 
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Fig. 1 | The ALPHA-2 central apparatus and magnetic field profile. 

a, b, Penning traps, comprising stacks of cylindrical electrodes immersed 
in a uniform axial magnetic field generated by an external solenoid (not 
shown), are used to confine and manipulate antiprotons (pP) and positrons 
(et) to produce antihydrogen. Cold (less that 0.5 K) anti-atoms can be 
trapped radially by the octupole field and axially by the magnetic well that 
is formed by the five mirror coils and plotted in b. The 243-nm laser light 
is injected from the antiproton side (left in a) and is aligned and position- 
stabilized on the fixed optical cavity axis. The laser beam crosses the trap 


to allow any excited atoms to decay to the ground state, the trapped 
population is exposed to laser radiation at a fixed frequency for 300s. 
The frequencies used here were chosen to probe only the d-d transi- 
tion (Fig. 2). Following the laser exposure, we use microwave radia- 
tion to remove the 1S, state atoms by driving a resonant spin-flip'> '°. 
The microwave frequency is scanned over 9 MHz in 32s; these 
parameters and the injected power level (160 mW at the vacuum feed- 
through) are chosen to eject anti-atoms quickly while minimizing the 
perturbation of the vacuum and cryogenic environment. The silicon 
vertex detector is used to detect annihilations of antihydrogen atoms 
that are lost during the laser and microwave exposures. Finally, the 
atom-trap magnets are ramped down in 1.5s, so that any surviving 
anti-atoms would be released and their annihilations detected. If the 
microwave removal of 1S.-state atoms is 100% effective, then the sur- 
viving particles would be only 1Sq-state atoms that were not removed 
by laser action. 


Table 1 | Antihydrogen atom counts 


axis at an angle of 2.3°. The piezoelectric actuator behind the output 
coupler is used to modulate the cavity length to lock the cavity to the laser 
frequency. The axial scale in a and b is the same; the radial extent of the 
annihilation detector is larger than illustrated. The vacuum window and 
photo-diode are further to the right (by about 1 m) than illustrated. The 
brown-shaded electrodes are used to apply blocking potentials during the 
experimental trials to ensure that antiprotons that result from ionization 
are confined to annihilate in the active volume of the detector’. 


We collected data for nine different laser frequencies in four sets. 
Each set involved four distinct frequencies and 21 (or 23, see below) tri- 
als at each of these frequencies. In each set, two of the frequencies were 
always the calculated hydrogen on-resonance frequency at zero laser 
power (zero detuning) and a far-off-resonance frequency (—200 kHz 
detuning at 243 nm), as used previously’. The other two frequencies in 
each set were chosen to address various detunings in the neighbour- 
hood of the d-d resonance. The data are summarized in Table 1. The 
repetition of the points at —200 kHz and zero detuning was intended 
to address variations in laser power and trapping number between sets. 
The repetition at + 25 kHz was a check of reproducibility. During the 
accumulation of data for each set, the four frequencies were interleaved 
in a varying order and the operators were blinded as to the identity of 
each frequency setting. The power of the enhancement cavity (about 
1 W) was monitored by measuring the transmitted power outside of the 
vacuum chamber (Fig. 1). Each set was preceded by a thermal cycle of 
the apparatus to regenerate the cryo-pumping surface. 

The background-corrected numbers in Table 1 are calculated from 
raw detector events using the measured, overall efficiencies of the 
silicon vertex detector. These efficiencies depend on the particular 
multivariate analysis algorithm that was used to distinguish antiproton 
annihilations from cosmic rays (Methods) in the relevant time window. 
The efficiencies and background rates are listed in Table 2. 

The number of initially trapped atoms N; for a trial is unknown a 
priori, but was typically about 60 at the beginning of a measurement set. 
In Table 1, the total number of atoms for each group of trials is assumed 
to be the sum L+ M-+S of the numbers of atoms lost during laser (L) 
or microwave (M) exposure and the number of surviving atoms (S) 
(see Table 1). The trapping rate declined slowly but reproducibly dur- 
ing each set (Extended Data Fig. 1). The third set has 23 trials at each 


Table 2 | Annihilation detector efficiencies and background rates 


Background Uncertainty 
Efficiency Uncertainty rate(10-3s~!) (10-3s~1) 
Laser exposure (300s) 0.472 0.001 1.04 0.11 
Microwave exposure (32s) 0.801 0.002 33.0 0.6 
Release of surviving 0.852 0.002 191 1 


atoms (1.6s) 


Atoms lost 
Atoms lost — during 
Laser during microwave Initially 
detuning, Number _ laser exposure, Surviving trapped 
D (kHz) of trials exposure,L M atoms, S atoms, N; 
Setl —200 21 + 383423 504425 894+435 
—100 21 2249 415+24 494424 931435 
0 21 264424 423424 217416 904+38 
+100 21 75+14 411423 424423 910+35 
Set2  —200 21 2649 394423 466424 886434 
—25 21 113416 423424 326420 862435 
0 21 219422 390423 269418 878437 
+25 21 173420 438424 296419 907+437 
Set3  -—200 23 847 354422 479424 841433 
) 23 303426 454425 248417 1,005+40 
+50 23 176420 390423 339420 905+37 
+200 23 36+11 446+24 459423 941+35 
Set4 -—200 21 + 525+26 541425 1,073437 
—50 21 86+15 475+25 495424 1,056438 
) 21 274425 480425 275418 1,029+40 
+25 21 202+21 516426 305419 1,023+38 
Total 344 1,99 6,917 6,137 15,045 
The integrated number of antihydrogen atoms is listed for each laser detuning (at 243 nm) within 
each set of trials. The background has been subtracted. Uncertainties quoted are one standard 
deviation (s.d.) counting errors. We refer to L as the ‘appearance signal’; S is used to infer the 


‘disappearance signal’. 
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The detection efficiencies and background rates of the silicon vertex detector, as determined by 
the multivariate analysis (Methods), are listed for the three observation windows. The 1.6-s win- 
dow during which the surviving atoms are released extends for 0.1s after the magnet rampdown 
is complete. 
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Fig. 2 | Hydrogenic energy levels. Calculated energies (E; for hydrogen) 
of the hyperfine sublevels of the 1S (bottom) and 2S (top) states are 
plotted against magnetic field strength. The centroid energy difference 
E\s-25 = 2.4661 x 10!° Hz has been suppressed on the vertical axis. The 
vertical black arrow indicates the two-photon laser transition probed here 
(frequency fa_q); the red arrow illustrates the microwave transition used to 
remove the 1S, state atoms (frequency f-_b). 


frequency because of a hardware failure in an early block of four trials; 
extra trials were added to compensate for the excluded data. 

To examine the general features of the measurement results, we plot 
(Fig. 3a) the four datasets on one graph by using a simple scaling. The 
points at zero (on-resonance) and —200-kHz detuning (at which no 
signal is expected’), repeated for each set, are used for the scaling. For 
the laser exposure (‘appearance’) data, we define a scaled response at 
detuning D within each set: n(D) = L(D)/L(0). Similarly, for the sur- 
viving population (‘disappearance’ data), we use r,(D) = [S(—200kHz) 
— S(D)]/[S(—200 kHz) — S(0)]. The uncertainties shown are due to 
Poissonian counting errors only. For comparison, we also plot the 
results of a simulation!” based on the expected behaviour of hydrogen 
in our trap for a cavity power of 1 W, scaled to the zero-detuning data 
point. We see that the peak position and the width of the scaled spec- 
tral line are consistent with the calculation for hydrogen and that the 
experiment generally reproduces the predicted asymmetric line shape. 
There is also good agreement between the appearance and disappear- 
ance data (Fig. 3a). 

The simulation involves propagating the trapped atoms in an accu- 
rate model of the magnetic trap. When an atom crosses the laser 
beam, which has a waist of 200 jm at the cavity centre, we calculate 
the two-photon excitation probability, taking into account transit-time 
broadening, the a.c. Stark shift and the residual Zeeman effect. The sim- 
ulation determines whether excited atoms are lost owing to ionization 
or toa spin-flip event. The variable input parameters for the simulation 
are the cavity power and the laser frequency. The modelled response is 
asymmetric in frequency owing to the residual Zeeman effect'*. The 
width of the line, for our experimental parameters, is dominated by 
transit-time broadening, which contributes about 50 kHz full-width 
at half-maximum (FWHM) at 243 nm. For 1 W of cavity power, the 
a.c. Stark shift is about 2.5 kHz to higher frequency and the ionization 
contributes about 2 kHz to the natural line width. 
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Fig. 3 | Spectral line of antihydrogen. a, The complete dataset, scaled as 
described in the text. The simulated curve (not a fit, drawn for qualitative 
comparison only) is for a stored cavity power of 1 W and is scaled to the 
data at zero detuning. ‘Appearance’ refers to annihilations that are detected 
during laser irradiation; ‘disappearance’ refers to atoms that are apparently 
missing from the surviving sample. The error bars are 1-s.d. counting 
uncertainties. b, Three simulated line shapes (for hydrogen) are depicted 
for different cavity powers to illustrate the effect of power on the size and 
the frequency at the peak. The width of the simulated line (FWHM) asa 
function of laser power is plotted in the inset. 


To make a more quantitative comparison of the experimental results 
with the expectations for hydrogen, it is necessary to scrutinize differ- 
ences between the four datasets. The overall response should be linear 
in the number of atoms addressed, so it is possible to normalize for this. 
However, the line width depends on the stored power in the cavity, as 
does the frequency of the peak (Fig. 3b). The cavity power is difficult 
to measure in our geometry because the amount of transmitted light 
depends sensitively on the small transmission from the output coupler 
(about 0.05%) and on absorption in the optical elements through which 
the transmitted light exits (Fig. 1). We observe that the transmitted 
power can degrade, owing to accumulated ultraviolet damage to the 
window and mirror substrate, whereas the finesse of the cavity does 
not change. 

A modelling approach that self-consistently accounts for fluctuations 
in experimental parameters is a simultaneous fit in which we allow the 
four sets to have distinct powers (P_4), but the same frequency shift 
with respect to the hydrogen calculation (Methods). We require that 
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Table 3 | Summary of uncertainties 


Estimated 
Type of uncertainty size (kHz) Comment 
Statistical uncertainties 3.8 Poisson errors and curve fitting to 
measured data 
odelling uncertainties 3 Fitting of simulated data to 
piecewise-analytic function 
odelling uncertainties 1 Waist size of the laser, antihydro- 
gen dynamics 
agnetic-field stability 0.03 From microwave removal of 1S,- 


state atoms (see text) 
Absolute magnetic-field 0.6 From electron cyclotron resonance 
measurement 


Laser-frequency stability 2 Limited by GPS clock 


d.c. Stark shift 0.15 Not included in simulation 
Second-order Doppler shift 0.08 Not included in simulation 
Discrete frequency choice 0.36 Determined from fitting sets of 


of measured points 
Total 5.4 


The estimated statistical and systematic errors (at 121 nm) are tabulated. 


pseudo-data 


the average powers for the appearance and disappearance data within a 
set are the same. We find the parameters that best reproduce the data to 
be: P, =1135(50) mW, P,=904(30) mW, P;=1123(43) mW, 
P,=957(31) mW and f= —0.44 + 1.9kHz, where 6fis the difference 
(at 243 nm) between the resonant frequency inferred from the fit and 
the resonant frequency of hydrogen expected for our system, both at 
zero power. The uncertainties represent the 68% confidence interval of 
a least-squares fit and do not take into account systematic uncertainties. 
The fit uses the five variables identified above, and the individual data 
points at each frequency are weighted by their Poissonian counting 
errors. We include an uncertainty of 3.8 kHz (Table 3) in the final reso- 
nance frequency to represent statistical and curve-fitting uncertainties. 

Considering systematic effects, the microwave removal procedure 
for the 1S,-state atoms provides a reproducibility check on the strength 
of the magnetic field at the centre of the trap. At the beginning of each 
data-taking shift, the magnetic field of the external solenoid magnet 
was reset to a standard value using an electron cyclotron resonance 
technique’®. For the complete dataset, we find that the variations in the 
magnetic field at the minimum field of about 1 T are about 3.2 x 107° T 
(1 s.d.). This corresponds to a resonance frequency shift’? of only about 
15 Hz at 243 nm for the d-d transition. (At 1 T, the c—c transition is 
about 20 times more sensitive to magnetic field shifts, which is why 
the d-d transition is more attractive here.) The laser frequency was 
tuned with respect to the minimum of the magnetic well, such that the 
resonance condition should be met in the centre of the trap for zero 
detuning in the limit of zero laser power. The accuracy of the magnetic- 
field determination corresponds to an uncertainty of 300 Hz in the 
243-nm laser frequency. 

Including all of the statistical and systematic uncertainties that we 
have identified (Table 3, for 121 nm), our fit of the experimental data 
to the hydrogen model yields 


fia = 2,466,061,103,079.4(5.4) kHz 


The value (Methods) for hydrogen calculated at the minimum field 
in our system (1.03285(63) T) is 


fra = 2,466,061,103,080.3(0.6) kHz 


where the uncertainty is determined by the experimental error in meas- 
uring the field. 

Owing to the motion of the antihydrogen atoms in the inhomoge- 
neous trapping field, this comparison is necessarily model-dependent. 
We therefore conclude that the measured resonance frequency for this 
transition in antihydrogen is consistent with the expected hydrogen 
frequency to a precision of about 2 x 107”. Although the precision of 
our measurement is still a few orders of magnitude short of the state of 
the art with a cold hydrogen beam§, the modern frequency reference 
permits the accuracy of our experiment to exceed that achieved with 
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trapped hydrogen’ as recently as the mid-1990s. We used a total of 
about 15,000 antihydrogen atoms to obtain this result, compared to 10” 
trapped atoms in the analogous matter experiment. Our dataset was 
accumulated over a period of ten weeks, illustrating that the antihy- 
drogen trapping procedure is robust and that systematic effects are 
manageable. ALPHA’ emergent antihydrogen production, storage and 
detection techniques, together with advances in ultraviolet laser tech- 
nology and frequency metrology, pioneered by Hansch and colleagues, 
enable precision anti-atom spectroscopy. 

Precision experiments at the antiproton decelerator have recently 
constrained the properties of the antiproton through studies in Penning 
traps”? or with antiprotonic helium”*. For example, the antiproton 
charge-to-mass ratio is known to agree with that of the proton to 69 
parts per trillion”!, equivalent to an energy sensitivity of 9 x 10~°” GeV. 
The ratio of the antiproton mass to the electron mass has been shown 
to agree with its proton counterpart” to 8 x 107", and antihydrogen 
has been shown to be neutral” to 0.7 parts per billion. Our measure- 
ment of antihydrogen probes different and complementary physics at 
a precision of a few parts per trillion, or an energy level of 2 x 10~7° 
GeV. This already exceeds the precision (4 x 10-1? GeV) in the mass 
difference of neutral kaons and antikaons?°, which has long been the 
standard for particle-physics tests of charge-parity-time invariance. 

Near-term improvements in the ALPHA-2 apparatus will include a 
larger waist size for the radiation in the optical cavity to reduce tran- 
sit-time broadening, operation at lower magnetic fields and operational 
improvements to accelerate data acquisition and to reduce statistical 
uncertainties. Future measurements will require an upgrade to our 
frequency reference to exceed a fractional precision of 8 x 107 
(Methods). The rapid progress detailed here confirms that, in principle, 
there is nothing to prevent the achievement of hydrogen-like preci- 
sion in antihydrogen and the associated very sensitive test of charge- 
parity—time symmetry in this system. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0017-2. 
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METHODS 

Time evolution of the dataset. The time evolution of the atoms detected in one 
of the datasets is depicted in Extended Data Fig. 1. 

Laser system for 243-nm light. A Toptica TA-FHG pro laser system uses a pair of 
frequency-doubling cavities to generate 150 mW of 243-nm light from a 972-nm 
extended cavity diode laser (ECDL). The 243-nm beam is mode-matched to the 
1S-2S enhancement cavity and sent along a 7-m-long path with active beam-pointing 
stabilization between the laser laboratory and the ALPHA-2 apparatus. The 
enhancement cavity is locked to the laser frequency using a single piezoelectric 
actuator located behind the output coupler mirror” to feedback on an error signal 
generated via the Pound—Drever-—Hall technique”’. The light transmitted through 
the cavity is monitored using a photodiode that is located outside the vacuum 
system. The cavity has a measured finesse of 250 and achieves a circulating power 
of approximately 1 W. 

The 972-nm ECDL is frequency-stabilized (also using the Pound—Drever-Hall 
technique) to a Menlo Systems ultralow-expansion cavity via an acousto-optic 
modulator, which shifts the light from the 1S-2S transition frequency of the laser 
to the closest resonance of the ultralow-expansion cavity. The resonance frequency 
of the cavity is monitored continuously using a Menlo Systems femtosecond 
frequency comb, which is referenced to atomic time using a K+ K Messtechnik 
GPS-disciplined quartz oscillator. 

The measured difference between the ultralow-expansion resonance frequency 
and a comb line with a known frequency is fed forward to the control of the acousto- 
optic modulator with an averaging time of 20s to remove long-term drifts. The 
uncertainty of the frequency difference over the 20-s averaging period corre- 
sponds to an Allan deviation”® of 75 Hz at 972 nm (300 Hz at 243 nm). One of 
the frequency-comb counters is used to measure the signal from a Symmetricom 
CS4000 caesium clock to confirm correct operation of the quartz oscillator and the 
radio-frequency chain of the frequency comb. The count reaches a fractional Allan 
deviation of 8 x 10~ after 1,000s of averaging, which corresponds to fluctuations 
of 250 Hz at 972 nm (1 kHz at 243 nm). 

An independent, identical, 972-nm ECDL frequency stabilized to an independ- 

ent, identical, ultralow-expansion cavity is used to evaluate the short-term line 
width of the spectroscopy laser. The beat note generated between the two 972-nm 
lasers has a spectrum composed of individual lines, each with a line width of less 
than 1 Hz, within a 300-Hz (1.2 kHz at 243 nm) FWHM Gaussian envelope. The 
source of the broadening is thought to be acoustic noise within the laser laboratory; 
work is ongoing to reduce the broadening effect. 
Suppression of cosmic-ray background. To determine the signal events in the 
(a) 1.6-s, (b) 32-s and (c) 300-s observation windows, we require three differ- 
ent suppression techniques. (The 1.6-s window extends to 0.1s after the magnet 
rampdown is complete.) We tune the multivariate analysis (MVA) for each of 
the three windows to optimize the statistical significance of the estimated signal. 
Annihilation events are distinguished from background events (primarily cosmic 
rays) by their distinctive topologies. Fourteen selection variables that are sensitive 
to the difference between annihilation and background events were used as inputs 
to an MVA package"®. The variables included are: (i) the total number of channels 
registering ‘hits’ by charged particles; (ii) the radial coordinates of the reconstructed 
annihilation vertex; (iii) the sum of the squared residual distances of hits from a 
fitted straight line; six topological variables (iv—ix); and five additional variables 
(x-xiv). The topological variables are: (iv) a sphericity variable; (v) the cosine of the 
angle between the event axis and the detector axis; (vi) the angle between the event 
axis and the vertical direction in the x-y plane; (vii) the number of reconstructed 
tracks; (viii) the number of three-hit combinations used as track candidates; 
(ix) the distance of closest approach of the tracks. The additional variables are: 
(x) the minimum and (xi) mean values of the track radius in canonical form; 
(xii) the minimum and (xiii) mean values of the pitch of the helical track in 
canonical form; and (xiv) an integer sum of the sense of curvature (left=—1 or 
right = + 1) for all of the tracks in the event. 

The signal data and background data used for MVA training and testing com- 
prise a set of 580,846 annihilation events and 3,740,613 background events. The 
signal events were produced during antiproton and positron mixing in the appa- 
ratus and contain less than 1% background. Background events were collected 
during times when there were no antiprotons in the apparatus. 

The 1.6-s observation window. A classifier cut was chosen to optimize the signifi- 
cance for an expected 200 counts of signal and 350 counts of background. The anal- 
ysis gives a background rate of 0.191 + 0.001s~' and an efficiency of 0.852 + 0.002 
(statistical error only) annihilations per detector trigger. 

The 32-s observation window. The analysis was chosen to optimize the 
significance for an expected 400 counts of signal and 3,500 counts of background. 
The analysis gives a background rate of 0.033 + 0.0006s~! and an efficiency of 
0.801 + 0.002 (statistical error only) annihilations per detector trigger. 


The 300-s observation window. A classifier cut was chosen to optimize the sig- 

nificance for an expected 250 counts of signal and 330,000 counts of background. 
The analysis gives a background rate of 0.0010 + 0.0001 s_' and an efficiency of 
0.472 + 0.001 (statistical error only) annihilations per detector trigger. 
Fitting the data using the hydrogen simulation. The build-up of laser power in 
the enhancement cavity is one of the primary experimental parameters that influ- 
ence the data in Table 1. The main effect of a change in laser power is on the ampli- 
tude of the measured line, but there is also an effect on the peak position through 
the a.c. Stark shift and on the line width owing to depletion effects. In our set-up, 
there is considerable uncertainty in measuring the absolute intra-cavity laser 
power; relative measurements show that although the constancy of laser power 
within any single measurement set is good, there are variations between the sets. 

To reflect this experimental reality in our analysis of the data, the x” statistic for 
the full dataset is minimized with respect to a function that, aside from an overall 
frequency shift, allows a unique laser power in each set and incorporates the effects 
of those laser powers on the amplitude, line width and line centre based on the 
simulation of hydrogen in our experiment. 

The construction of the fit function therefore starts by running a detailed simu- 
lation of hydrogen in the ALPHA-2 magnetic trap for an array of input laser powers 
and frequencies that spans the experimentally relevant values, in this case from 
—200 kHz to + 300 kHz in laser detuning and from 0.7 W to 1.25 W in laser power. 
We simulate a total of 365,000 atoms in this array, after which we interpolate to 
obtain continuous values in both laser detuning and power. The interpolation in 
power is a linear regression at each detuning in the array, based on the observed 
linear behaviour. For interpolation in detuning, a fit to a piecewise-analytic func- 
tion that provides a good approximation to the simulation data is used. An error 
associated with this fit is included in Table 3. The discrete simulated points and 
the smooth interpolation are plotted in Extended Data Fig. 2. 

Calculation of the resonant frequency for hydrogen. The frequency fg_a is calcu- 
lated from corrections to the centroid-to-centroid frequency fs2s: 
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where h is Planck’s constant, fiz7(n) is the hyperfine splitting of the state with prin- 
ciple quantum number n, jie and {lp are the magnitudes of the magnetic moments 
of the electron and proton, respectively, ju is the reduced mass of the electron, m is 
the electron mass, e is the fundamental charge, dp is the Bohr radius for an infinite- 
mass nucleus and B is the magnetic field. 

The first correction describes the difference in the hyperfine splittings of the 
1S and 2S states. The second (third) correction describes the difference in the 
magnetic moment of the electron (proton) in these states. The fourth correction 
describes the difference in the diamagnetic shift. 


The magnetic moment of the bound electron is (equation (84))?? 
free a at 1 2 a a? m 
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where «is the fine-structure constant, "°° is the free-electron dipole moment and 


Mis the proton mass; the dependence on n is described elsewhere*”*!. The mag- 
netic moment of the bound proton is (equation (87))”? 
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where 1!" is the free-proton dipole moment. 


Using current CODATA values” for the fundamental constants, the frequency is 


Fy =f ggg ~ 310,712.229 kHz + 186.071B kHzT™ 
—0.283B kHzT~'+ 387.678B? kHzT ” 


Sample size. No statistical methods were used to predetermine sample size. 
Data availability. The datasets generated and analysed during this study are avail- 
able from the corresponding author on reasonable request. 
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As conventional electronics approaches its limits', nanoscience 
has urgently sought methods of fast control of electrons at 
the fundamental quantum level”. Lightwave electronics*>—the 
foundation of attosecond science*—uses the oscillating carrier 
wave of intense light pulses to control the translational motion of 
the electron’s charge faster than a single cycle of light*-'°. Despite 
being particularly promising information carriers, the internal 
quantum attributes of spin’® and valley pseudospin!”-”! have 
not been switchable on the subcycle scale. Here we demonstrate 
lightwave-driven changes of the valley pseudospin and introduce 
distinct signatures in the optical readout. Photogenerated electron- 
hole pairs in a monolayer of tungsten diselenide are accelerated and 
collided by a strong lightwave. The emergence of high-odd-order 
sidebands and anomalous changes in their polarization direction 
directly attest to the ultrafast pseudospin dynamics. Quantitative 
computations combining density functional theory with a non- 
perturbative quantum many-body approach assign the polarization 
of the sidebands to a lightwave-induced change of the valley 
pseudospin and confirm that the process is coherent and adiabatic. 
Our work opens the door to systematic valleytronic logic at optical 
clock rates. 

Ina crystalline solid, a set of atoms forming the unit cell repeats itself 
on a periodic lattice. The Bloch theorem” states that a single-electron 
wavefunction ¢a, is composed of a plane wave with wave vector k and 
a lattice periodic part ugg: 


Ql) = Ugg (1) exp(ik - 1) (1) 


The plane-wave part describes the effective motion of electrons 
throughout the crystal, whereas u,4(r) encodes the contribution from 
atomic orbitals and spins (quantum number aq). In conventional elec- 
tronics, quasi-static fields cause a tiny imbalance of the number of 
electrons moving in the +k and —k directions. This leads to incohe- 
rent charge currents. In quantum electronics, in contrast, one tries to 
coherently transport the quantum information stored in atomic orbitals 
or spins before coherence-destroying scattering becomes effective’>. 
Recently, a carrier wave of intense light pulses has been exploited as 
an alternating-current bias to transport electrons faster than scatter- 
ing occurs, yielding intriguing quantum effects, such as dynamical 
Bloch oscillations®”!3, electron-hole recollisions®!?”4, electronic 
quantum interference®’, and interband excitations®!°. The resulting 
high-harmonic radiation®”*!°-> has been used to retrieve the elec- 
tronic band structure” optically. Despite the first evidence of Berry’s 
phase effects'?®, resulting directly from the topological k dependence 
of Uqx, most lightwave-driven transport of electric charge has focused 
on controlling k in the plane-wave part of the Bloch wavefunction. 
Independently, quantum information processing” has been pursued 
using internal degrees of freedom encoded in ux, such as spin’® or 
valley pseudospin!’~*!?’-3°. Monolayers of transition-metal dichalco- 
genides form an ideal testbed for such concepts because they feature 


two separate band minima—called valleys'”-°—that can be selectively 
excited by optical means'®*!*?”~”?. The coherence of the valley pseudos- 
pin has been shown to be conserved even in steady-state photolumines- 
cence experiments””-”’, rendering this degree of freedom a promising 
information carrier’7!9?!?’*°. Yet its subcycle manipulation remains 
an open challenge. 

Here we demonstrate that the valley pseudospin in a transition-metal 
dichalcogenide monolayer can be changed by lightwave-driven intra- 
band transport within a few femtoseconds. To this end, we accelerate 
optically prepared coherent electron-hole pairs in monolayer tungsten 
diselenide (WSe,) by intense, phase-stable multi-terahertz waveforms”. 
Recolliding electrons and holes emit their kinetic energy in high-order 
sidebands, the odd orders of which directly reflect the electron-hole 
pairs’ valley pseudospin. Pairs that are selectively created in a single 
valley can be transferred partly into the opposite valley by the strong 
terahertz field, leaving a unique fingerprint in elliptically polarized 
sidebands. 

Monolayers of WSez (Fig. 1a) feature an electronic band structure 
with a direct energy gap separating the filled valence band from the 
empty conduction band at the corners of the hexagonal unit cell in 
momentum space, called the Brillouin zone’”~!? (Fig. 1b). Owing to 
the honeycomb crystal lattice (Fig. 1a), the Brillouin zone contains two 
inequivalent corners!”-!°—the K and the K’ points. In the vicinity of K 
(or K’), the electronic wavefunction uo, of the highest valence band is 
primarily composed of atomic 5d orbitals of tungsten with a magnetic 
quantum number’®!? m= 2 (or -2). A sign flip of m implies a time 
inversion of the electronic wavefunction u,, between K and K’ (insets 
in Fig. 1b). Also the conduction-band wavefunctions at the K and K’ 
points are time-reversal pairs. This is why circularly polarized light can 
prepare electron-hole pairs selectively at the K or the K’ point, depend- 
ing on helicity!”~*”-*?, and the valley polarization may be described 
by a spin-like quantity called valley pseudospin. In the following, we 
determine whether intense terahertz pulses can drive electron transport 
far enough to induce superpositions of K and K’ states and ultimately 
to flip the valley pseudospin. 

We start with a linearly polarized 100-fs near-infrared pulse to 
create electron-hole pairs in bulk and monolayer WSe, on a dia- 
mond substrate (Extended Data Fig. 1). Being a superposition of 
right- (o+) and left-circularly (o_) polarized light, this prepara- 
tion pulse generates coherent electron-hole pairs in both the K and 
K’ valleys. Coulomb attraction causes the quasiparticles to form a 
series of atom-like bound states, called excitons”. Our preparation 
pulse is resonant with the 1s A series (transition from the uppermost 
valence band to the conduction band) exciton!*!”-*! (with photon 
energies 1.621 eV for the bulk and 1.665 eV for the monolayer). 
Simultaneously, an intense terahertz transient (inset in Fig. 1c; centre 
frequency, “py, = 40 THz) coherently drives the excitonic polariza- 
tion, leading to real-space acceleration followed by a recollision of 
the constituent electron and hole. This dynamics is reminiscent of the 
three-step model underlying attosecond pulse generation in atomic 
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Fig. 1 | High-odd-order sideband generation in monolayer WSe>. 

a, Crystal structure of a WSe2 monolayer. Tungsten atoms are depicted as 
grey, selenium atoms as brown spheres. b, Hexagonal Brillouin zone with 
the spin-split valence and conduction band edges at the inequivalent 
Kand K’ points. The main orbital contributions to the top of the valence 
bands, as calculated from density functional theory, are shown below 
(the green—-yellow colour scale depicts the sign of the imaginary part). 
Red (blue) arrows in a and b indicate the zigzag (armchair) direction. 

c, High-order sideband intensity, Isc, measured from bulk WSe; (black) 


gases*. Upon recollision, the electron can recombine with the hole, 
emitting a photon with an energy corresponding to the bandgap plus 
the kinetic energy imparted by the acceleration. 

In bulk WSe;, this mechanism leads to the generation of high-order 
sidebands accompanying the interband 1s exciton line (Fig. 1c, black 
curve). Because the bulk crystal is inversion-symmetric, high-order 
sideband generation (HSG) is independent of the polarity of the driving 
field, and collisions between the excitonic constituents occur in every 
half-cycle. In the frequency domain, this periodicity causes even side- 
band orders with a spectral separation of 217. 

Figure 1c also depicts an observation of high-order sideband emis- 
sion from a monolayer material (red curve), the first to our knowl- 
edge. Despite the ultimately low sample thickness, harmonic sidebands 
up to order n= 11 clearly rise above the noise floor. Remarkably, 
the sideband spectrum differs qualitatively from the bulk case. 
Besides even orders, we also detect odd orders (n > 3) of compara- 
ble strength—a result that has neither been predicted nor observed 
before. Phenomenologically, the emergence of HSG spaced by vr, 
implies that the time structure of the light emission is periodic after a 
full oscillation cycle of the terahertz wave, that is, it differs for positive 
and negative terahertz half-cycles. 

For a microscopic understanding, we first note that the strongest 
odd-order sideband intensity occurs when electron-hole pairs are 
accelerated along the zigzag crystal direction (Fig. lc, red curve), 
whereas HSG is strongly suppressed for terahertz fields along the arm- 
chair direction (Fig. 1c, blue curve). Because inversion symmetry along 
the zigzag direction is broken only in momentum space (Fig. 1b, red 
arrow), not in real space (Fig. la, red arrow), our observation proves 
that odd-order sideband generation must be related to intraband trans- 
port, which is best understood in momentum space. The terahertz field 
drives electrons and holes within the bands in which the near-infrared 
pulse has prepared them. The emergence of odd-order HSG indicates 
that the coherent electron-hole pairs traverse a large fraction of the 
Brillouin zone to reach wave vectors k at which the effective mass is 
no longer isotropic. Hence, the carrier dynamics depends on the sign 
of the driving waveform and a spectral modulation of the sideband 
radiation with vy, results. 
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and monolayer WSe; in the zigzag (red) and the armchair direction (blue, 
shifted downwards by one order of magnitude for a better visibility). In 
the zigzag direction, even- and odd-order sidebands exhibit comparable 
intensities, while odd orders are almost completely suppressed in the 
armchair direction. We note that the frequency axis of the bulk spectrum 
has been shifted by 9 THz owing to the different excitation frequencies. 
The inset shows the electro-optically sampled multi-terahertz field, 
featuring a frequency of py, = 40 THz and a peak field strength in air of 
Eruz =18MV cm}, 
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Fig. 2 | Electronic structure of WSe2 and quantum theory of high-order 
sideband generation. a, b, Band structure of monolayer WSe? calculated 
by DFT for the zigzag (a) and the armchair (b) directions (dark colours, 
spin-up bands; light colours, spin-down bands). There is no inversion 
symmetry about the K/K’ valleys in the zigzag direction, supporting the 
generation of odd-order sidebands. The bands in the armchair direction 
are inversion-symmetric, inhibiting odd-order HSG. The red and blue 
arrows at the bottom of each panel indicate the high-symmetry directions 
highlighted in Fig. 1. c, Computed intensity spectrum Isc for lightwave- 
driven electron-hole recollision along the zigzag (red) and armchair (blue, 
down-shifted along the intensity axis by one order of magnitude for better 
visibility) directions. 
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Fig. 3 | High-order sideband polarization for different crystal 
orientations. a, b, Simulated high-order sideband intensity in the 
polarization basis parallel (0 = 0°, light colours) and perpendicular 

(@ = 90°, dark colours) to the linearly polarized excitation and driving 
fields, taking into account the polarization contributions of opposite 
helicity in the different valleys. Individual sideband orders are multiplied 
with the indicated multiplication factors. Even and odd orders are cross- 
polarized in the zigzag direction (a), while the armchair direction (b) 
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Fig. 4 | Intervalley mixing and lightwave valleytronics. a, Measured 
polarization-resolved intensity Iysg of individual sideband orders 
(normalized, coloured data points) for circularly polarized excitation 
(normalized, black data points) of the K valley only. The terahertz-driven 
intraband transport (Ery, = 23 MV cm’, along the zigzag direction) 
into the K’ valley adds polarization contributions of opposite helicity, 
resulting in an overall elliptical polarization. b, Computed polarization of 
high-order sidebands (orders four to seven, solid lines, same colours as in 
a) following a valley-selective excitation in the K valley. c, d, Computed 
distribution of the coherent electron-hole polarization p,; (colour scale) 
in reciprocal space at the time of small (c) and maximal displacement (d); 
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supports only strong even orders with a parallel polarization. 

c, d, Measured polarization-resolved intensity Iysq normalized to the 
fourth order for a multi-terahertz field pointing along the zigzag (c) 

and the armchair (d) directions. The zigzag direction confirms cross- 
polarized even- and odd-order sidebands, while all orders in the armchair 
direction exhibit a parallel polarization (odd orders suppressed by crystal 
symmetry). 


conditions match the experimental analysis in a. Near a zero-crossing of 
the electric field (indicated in the inset of c), the coherent electron-hole 
pairs reside in the K valley (borders indicated by red dashed lines). In 

the following half-cycle (d), a pronounced ledge of electrons and holes is 
driven from the K to the K’ valley, underpinning subcycle valley transport. 
e, Computed polarization following a subcycle excitation by a 5-fs optical 
pulse and terahertz acceleration by an electric field of 23 MV cm7'. 

The maximum of the polarization is transferred far into the K’ valley, 
demonstrating that the valley pseudospin can indeed be switched by 
strong terahertz fields. 
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We corroborate this conjecture by computing the full band struc- 
ture of the WSe2 monolayer with density functional theory (DFT; 
see Methods). For large excursions from the K or K’ points, the 
electron-hole pairs do indeed experience strong asymmetries of 
the band structure along the zigzag direction (Fig. 2a). In contrast, 
the band structure in the armchair direction is symmetric about the 
K and the K’ points (Fig. 2b). Consequently, the sideband spectrum 
is expected to show hardly any odd orders, as indeed is observed in 
Fig. 1c. 

Nevertheless, this explanation is incomplete because the linearly 
polarized preparation pulse populates both K and K’ valleys equally. 
Since the bands in these valleys are mirror images of each other (see 
Fig. 2a), the plane-wave part of the electron-hole dynamics of the 
ensemble is effectively symmetric for positive and negative terahertz 
half-cycles, which forbids the generation of odd-order sidebands. 
This obvious contradiction with the experiment can be resolved 
only if the valley pseudospin, encoded in the atomic wavefunction 
Ua is taken into account. Therefore, we simulate the entire life cycle 
of the electron-hole pairs by combining a DFT calculation of the 
electronic band structure and the Bloch states with cluster expan- 
sion”? to solve the lightwave-driven non-perturbative many-body 
dynamics. We quantitatively include the electron-hole Coulomb 
interaction, pseudospin and dipole-matrix elements in two-di- 
mensional computations for conditions matching the experiment 
(see Methods). 

The computed high-order sideband spectra (Fig. 2c) reproduce the 
experimental results very well, including the emergence of even and 
odd orders as well as details, such as the suppression of the third-order 
sideband (see Methods). Most importantly, the presence of odd-order 
sidebands is observable in our setting only if the valley pseudospin is 
accounted for. The different pseudospin in the K and K’ valleys imparts 
different helicities on the sideband radiation resulting from recom- 
bination within the respective valleys. This fact breaks the symme- 
try for positive and negative terahertz field polarities and enables the 
generation of odd orders. Interestingly, the pseudospin tagging of the 
K and K’ excitations causes a characteristic polarization state of the 
sidebands (Fig. 3a and b). Straightforward vector algebra (see Methods) 
shows that summing up the counter-circularly rotating field vectors 
of odd-order sidebands from the K and K’ valleys yields a field that is 
linearly polarized perpendicular to the excitation pulse. Analogously, 
one can show that even-order sidebands have to be polarized parallel 
with the excitation light. 

We test this prediction by measuring the intensity of even and odd 
sideband orders with an analyser set at angle 0 (Fig. 3c and d). Here 
@=0° (or 90°) denotes a polarization parallel (or perpendicular) to 
the near-infrared and the terahertz fields. As expected in the zigzag 
direction, even-order sidebands are indeed polarized parallel to the 
excitation light, while odd orders are perpendicularly polarized (Fig. 3a 
and c). By crystal symmetry, the armchair direction supports only even 
orders, which are polarized at 0 =0° (Fig. 3b and d). This theory-exper- 
iment comparison further corroborates that odd-order HSG is a fin- 
gerprint of the valley coherence, which prevails even under atomically 
strong fields. 

To trace precisely how the terahertz field changes the valley pseu- 
dospin, we selectively excite the K valley by a o , polarized excitation 
pulse. If the electron-hole pairs recollide within the same valley, their 
sideband emission is expected to remain polarized with the same 
helicity. Figure 4a shows the o polarization of the near-infrared 
preparation pulse (black curve), as well as the polarization state of 
the sideband orders n = 4 to 7, recorded by measuring the intensity 
as a function of the angle of a rotating analyser. In stark contrast to the 
incident pulse, the sidebands are polarized strongly elliptically, with 
their principal axis aligned at a large angle 6 (Fig. 4a). Our many-body 
computations (Fig. 4b) produce essentially the same polarization 
and quantitatively connect it with coherent K-to-K’ valley transport, 
which moves the electron-hole pairs into a superposition of both 
valleys contributing opposite helicities to the sideband emission. 
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The ellipticity of the resulting polarization is a measure of the yield 
of the intervalley transfer whereas 6 is set by the relative phase of 
the wave functions in the K and the K’ valley. Both dynamical and 
geometric phases contribute. In the future, the delicate variations in 
the polarization direction of different orders may be quantitatively 
evaluated to extract geometric phases acquired during lightwave 
acceleration within the bands. 

To visualize the intervalley transport, we calculate the full two- 
dimensional distribution of the coherent electron-hole polarization 
pxin reciprocal space, which is proportional to the density of coherent 
electron-hole pairs (see Methods). At a zero-crossing of the field, the 
electrons and holes reside in the K valley (Fig. 4c) while the following 
half-cycle accelerates the leading edge of p, into the K’ valley within 
a few femtoseconds (Fig. 4d), attesting to subcycle coherent interval- 
ley transport. The distribution function shows major spreading and 
deformation from the single-peak excitonic packet, mainly due to 
ionization-induced deformation of the wave packet, whereas a peak 
close to K is caused by the continued optical excitation during the 
terahertz acceleration. 

A key question is how far intervalley transport may be ultimately 
driven. For a realistic scenario, we compute the distribution of px for 
a terahertz peak field of 23 MV cm! and a shorter 5-fs near-infrared 
pulse to prepare electron-hole pairs at a well defined phase of the tera- 
hertz carrier wave. Figure 4e shows the coherent distribution at its 
maximal displacement, when the terahertz field has passed an intense 
half-cycle. Almost the complete distribution of coherent electron-hole 
pairs (96%) is driven into the K’ valley. Here the time evolution of the 
quantum mechanical wavefunction is effectively inverted by adiabati- 
cally shifting it through a time-reversal-invariant momentum point. 
These calculations are supported by a first experiment that utilizes sub- 
cycle injection with a 10-fs optical pulse and transfers 66% of the coher- 
ent electron-hole pairs into the K’ valley (see Extended Data Fig. 2). 
We anticipate that custom-tailored terahertz waveforms and injection 
times could allow for yet more sophisticated transfer protocols, paving 
the way to ultimately fast valleytronics. 
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METHODS 


Experimental set-up. A femtosecond titanium-sapphire laser amplifier (repetition 
rate, 3 kHz; pulse energy, 5.5 mJ; pulse duration, 33 fs; centre wavelength, 805 nm) 
pumps two parallel dual-stage optical parametric amplifiers delivering signal pulses 
with energies of up to 0.5 mJ and centre wavelengths that are tuneable between 
1.1m and 1.6m. We generate intense, phase-locked waveforms in the far- to 
mid-infrared spectral region (multi-terahertz range) via difference-frequency 
generation between the spectrally detuned phase-correlated near-infrared pulse 
trains of the optical parametric amplifier. The electric peak field of these few-cycle 
pulses reaches values of up to 1 VA~!. A yttrium aluminium garnet (YAG)-based 
super-continuum source provides ultrabroadband excitation pulses, covering the 
near-infrared and visible spectral ranges. The excitation pulse and the terahertz 
driving field are superimposed with an indium-tin-oxide-coated beam splitter 
and are collinearly focused onto a tungsten diselenide sample by a gold-coated 
parabolic mirror. The full-width-at-half-maximum (FWHM) spot sizes of the 
excitation pulse and the terahertz transient at the sample position are 22 um and 
50um (FWHM of the intensity), respectively. For our experiments, the centre 
frequency of the terahertz driving field is set to up, = 40 THz, while the peak 
electric field in air amounts to Ey, = 18 MV cm~! unless stated otherwise. With 
the help of a mechanical delay stage in the excitation beam path, the temporal 
overlap of both pulses is adjusted for maximum sideband emission. For a resonant 
excitation of the excitonic 1s ground state (with photon energies 1.621 eV for the 
bulk and 1.665 eV for the monolayer) in WSe, kept at room temperature, optical 
band-pass filters with a bandwidth of 10 nm are employed to spectrally constrain 
the white-light pulses. The generated high-order sideband radiation is recorded 
with a spectrograph coupled to a thermoelectrically cooled silicon charge-coupled 
device (CCD) camera. All spectra are corrected for the diffraction efficiency of 
the grating and the quantum efficiency of the detector, as well as the sensitivity of 
the spectrograph, for different polarizations. For the valley-selective injection of 
electron-hole pairs with o polarized light, we employ a combination of a half- 
wave and a quarter-wave plate in the beam path of the white-light pulses. The 
polarization states of the excitation pulses and the emitted sidebands are analysed 
by a Glan-Thompson polarizer (extinction ratio, 10~>) placed after the sample. 
Employing ultrashort gating pulses (pulse duration, 10 fs), we detect the waveform 
of the terahertz transients by electro-optic sampling in a 6.5-m-thick zinc telluride 
crystal while accounting for the detector response*!. 
Sample preparation. Bulk-like and monolayer samples of tungsten diselenide 
are mechanically exfoliated from a bulk crystal onto an intermediate substrate, 
a visco-elastic gel film (polydimethylsiloxane; Extended Data Fig. 1b). There, 
the sample thickness is verified using optical contrast. Subsequently, the sam- 
ples are transferred to a diamond substrate grown by chemical vapour deposi- 
tion (Extended Data Fig. 1c). In this dielectric environment, the exciton binding 
energy*” amounts to 0.25 eV, corresponding to a Keldysh parameter of 0.16 for 
the field ionization of the exciton using a terahertz amplitude of 18 MV cm “|. 
The energy of the 1s A-exciton state is confirmed by absorption measurements 
employing the white-light pulses, as well as photoluminescence emission under 
continuous-wave excitation of a green laser diode operating at a wavelength of 
533 nm. All experiments are performed under ambient conditions. The crystal 
orientation is determined by second-harmonic generation. For this purpose, 
linearly polarized, 10-fs near-infrared pulses are focused onto the monolayer 
sample. The second-harmonic intensity with a polarization parallel to the funda- 
mental, Is13c,|, is monitored using a spectrograph with a cooled silicon CCD. Isuc| 
peaks in the armchair direction (Extended Data Fig. 1a), where inversion symmetry 
is explicitly broken in the real space lattice (compare to Fig. 1a). Because monolayer 
WSe; preferably cleaves along high-symmetry directions, the alignment proce- 
dure is complemented by optical microscope images (Extended Data Fig. 1b, c). 
We estimate that the crystal orientation is determined with an uncertainty of 
approximately +1°. 
DFT calculations. Our DFT calculations are performed with the full-potential 
linearized augmented plane-wave all-electron method, as implemented in the 
Wien2k code*’. Specific calculation parameters and an optimized structure of 
WSe were taken from Kormanyos et al.*4. The wavefunction (with muffin-tin 
radii of 2.46 and 2.34 atomic units for W and Se, respectively) was expanded in 
partial waves with orbital quantum numbers up to 10. For the interstitial region, 
we used a plane-wave cut-off of 6.5 A~!. The spin-orbit interaction was included 
fully relativistically for core electrons while W 5s75p°4f'45d‘6s? and Se 3d!°4s"4p* 
valence electrons were treated within a second variational step method*». For 
the exchange-correlation functional, we consider the generalized gradient 
approximation**. These DFT calculations provide the relevant electronic bands 
and dipole matrix elements (including both o, and o_ contributions) throughout 
the irreducible Brillouin zone wedge. 

Because DFT solves electronic states and matrix elements from the atom- 
istic level up, by using minimal assumptions, it is a first-principles approach. 
Nevertheless, DFT must always approximate the correlation energy functional 
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because its dependence on density still remains unknown. To reproduce the exper- 
imental bandgap and exciton resonance energies, we adjust both the bandgap and 
dielectric constant of the Coulomb interaction V. We apply the Keldysh form for 
V to account for the principal aspects of dielectric screening inside a monolayer. 
These inputs uniquely define both the Coulomb and light-matter interactions 
needed to systematically describe electron-hole excitations generated by an optical 
and a terahertz field. These procedures yield an accurate description of the exci- 
tonic features of transition-metal dichalcogenides*”. 

Cluster-expansion computations. The cluster-expansion approach solves the 
many-body problem from the point of view of correlations, as an alternative 
perspective compared to DFT. Earlier investigations with semiconductors”’, 
the Jaynes-Cummings model** and strongly interacting atomic Bose-Einstein 
condensates show that many-body dynamics can be cast into a format exactly 
described in terms of a sequential build-up of clusters. Because cluster expansion 
defines all the resulting correlation functionals exactly*® for any given many-body 
Hamiltonian, it is also a first-principles approach. The description, furthermore, is 
non-perturbative in terms of the interaction strength*’”, which makes it ideally 
suited for studying extreme nonlinearities created, for example, by HSG. Because 
clusters build up sequentially, the dynamics on extremely short timescales is well 
described by completely omitting clusters beyond a certain level of complexity. 
This circumstance makes the cluster-expansion computations extremely efficient 
and accurate in describing ultrafast quantum kinetics of very diverse many-body 
systems. 

Here, the optical field E,,(t) predominantly excites a coherent superposition 
between the valence and the conduction band, that is, a microscopic polariza- 
tion px. The polarization p, can be characterized by a single-particle cluster”>*! 
describing coherently driven electron-hole pairs at crystal momentum fk. The 
polarization dynamics forms the backbone of the semiconductor Bloch equations; 
its polarization part has the structure 


 O e . 
the Pe = KP _ (1 —fie — f" Q(t) a ile|Eyy,(t) . VEPs oF fi; 


where €, is the Coulomb-renormalized electron-hole energy (defined 
by the DFT input), and f, FP) is the electron (hole) occupation at k, 
Q(t) = dy Egy (t) + Dy Ver_-Py is the Rabi energy renormalized by the 
Coulomb interaction V,. Both V; and the dipole-matrix element d; are defined 
from DFT. Once the optical field E,p(t) generates the polarization, the terahertz 
field Ery,(t) accelerates it via the gradient term. The generated microscopic polari- 
zation also scatters with densities and other polarization p; due to Coulomb 
interaction, inducing a two-particle correlation J; that introduces exciton-state- 
dependent dephasing for polarization. This full microscopic scattering is included 
using the description of Smith et al.“°. This yields a fully microscopic description 
of exciton-state-dependent dephasing, which considerably decreases the coherence 
time for all excitons except the 1s state"! This level of sophistication is necessary 
to systematically model the decay of coherent electron-hole pairs during their life 
cycle, encompassing preparation, acceleration and recollision. Because transi- 
tion-metal dichalcogenide monolayers couple strongly to light, we also include the 
self-consistent coupling of the semiconductor Bloch equations with Maxwell’s 
equations, following the exact approach of Kira and Koch*', using the experimen- 
tal geometry vacuum-monolayer—-diamond-vacuum. This allows us to predict 
both the transmitted and reflected fields in absolute units, as well as to include the 
radiative decay in the analysis. We use the DFT material parameters and matrix 
elements as an input to solve the full two-dimensional quantum polarization 
kinetics using 170 radial wave vectors spanning the full Brillouin zone and 241 
angular states. 

HSG and symmetry. Broken inversion symmetry is a necessary requirement for 
odd-order sideband generation. Inversion-symmetric media, such as bulk crys- 
tals of silicon or diamond, do not facilitate odd-order HSG. Broken out-of-plane 
symmetry at surfaces, interfaces and two-dimensional materials can manifest itself 
only if the driving field has a strong out-of-plane component. As seen in Fig. Ic, the 
strongest effect of HSG occurs in crystal directions in which the symmetry of the 
band structure, rather than the symmetry of the real-space lattice, is broken. This 
is caused by the fact that HSG originates from intraband currents. Interestingly, 
the third-order sideband in Fig. 1c is relatively strongly suppressed. This feature is 
indeed expected (Fig. 2) because this sideband results from electron-hole excur- 
sions that barely reach the asymmetric band structure features, whereas higher- 
order sidebands are associated with substantially longer trajectories in k space. 
Nonetheless, even in cases where the band structure is asymmetric, odd-order 
HSG may require an additional condition. Our experiment with linearly polari- 
zed excitation light exemplifies the situation. Since the band structure is mirror- 
symmetric with respect to the M point, the intraband dynamics of electron-hole 
pairs in the K valley driven by a positive terahertz field is identical to the intra- 
band dynamics of electron-hole pairs in the K’ valley driven by a sign-inverted 
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terahertz field. For equal populations in both valleys, the total system may seem 
inversion-symmetric at first glance. The observation of odd-order HSG (Fig. 1c, 
red curve) can be explained only if the valley pseudospin is taken into account: 
electron-hole recollisions from different valleys are disentangled by the valley- 
specific helicity of the emitted light. Alternatively, the contributions of both valleys 
can be separated by exciting only one valley with circularly polarized light (Fig. 4a). 
In any case, the emergence of odd-order sidebands in monolayer WSe is a direct 
consequence of the valley pseudospin in the material. We further underpin the 
influence of the valley pseudospin and intervalley transfer by a first experiment 
using subcycle injection of coherent electron-hole pairs by a 10-fs pulse (Extended 
Data Fig. 2). We extract a transfer rate of 66% marking the first subcycle transport 
of internal quantum attributes, which could inspire a development of correspond- 
ing quantum logic operations similar to spintronics*. 

Analytical derivation of the high-order sideband polarization. The theoretical 
and experimental finding of cross-polarized even and odd sideband orders in the 
case of a linearly polarized interband excitation (see Fig. 3) is a direct consequence 
of the opposite pseudospins of the K and K’ valleys and can be derived analyti- 
cally as follows. Owing to the optical selection rules of monolayer WSep, radiative 
electron-hole recombination in the two valleys generates light of opposite helicity. 
Consequently, the total emitted field of high-order sidebands Eyjsg consists of con- 
tributions from the K and K’ valley with right- (o,) and left-circularly polarized 
(o_) light, respectively, and can be written as 


Eusc = Euse,xO + Eusoxe 


where Eyjsc,x and Eyjsc,x’ denote the contributions from the K and the K’ valley, 
respectively. Owing to the symmetry of the band structure (see Fig. 2a), the dynam- 
ics occurring in the K’ valley for positive fields of the driving waveform Evy; is 
identical to the dynamics of the K valley for negative fields (K and K’ points are 
time-reversal pairs). Hence, we can substitute Eysc,x(Eruz) with Eysc,x(—Eruz)- 
By simultaneously using the relation 01 = (e, F iey) / ./2, with the Cartesian unit 
vectors e, and ey, we can rewrite the above relation as 


Evisc = Eusc,x (Erni) (ex ie,)/J2 + Evisgx (—E ty) (e, + ie,)/J2 


Furthermore, one can decompose Eysc.x into even- and odd-order contributions, 
that is, Euscx = Eeven + Eoaa- Because only even (odd) powers of Erp, contribute 
to Eeven (Eoad)s these fields feature even (odd) parity with respect to the terahertz 
driving field. Taking these relations into account, one can state 


Eusc,k (£Eqy,) = Eeven + “odd 


and we obtain 


Euisc = (Eeven + E gaa) (e,—ie,)/J2 + (Eeven—Eoaa) (e+ iey)/J2 


Exc & E, 


even” 


€,— Evga’ ie, 


Consequently, the conserved circular dichroism directly leads to cross-polarized 
even and odd orders. These derivations still hold for large intervalley transfer 
as long as the optical selection rules hold. Our experimental observations con- 
firm this fact as well as the conservation of the valley coherence during lightwave 
acceleration. 

Data availability. The data that support the findings of this study are available 
from the corresponding author upon request. 
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Extended Data Fig. 1 | Sample orientation. a, Azimuthal scan of the 
second-harmonic intensity polarized parallel to the excitation pulse, 

Istig,|| (blue curve), revealing the armchair direction at a crystal angle 

of y = 30°. The dashed line marks the expected scaling proportional to 
sin’?(3p). Around the polar diagram, the hexagonal Brillouin zone of WSe, 
is depicted with the high-symmetry points. b, Optical microscope image of 
the exfoliated monolayer on the visco-elastic gel film used for exfoliation. 
Areas appearing in lighter grey are few-layer tungsten diselenide. 

c, Monolayer sample after transfer to a diamond substrate. The contrast of 
this image has been enhanced to improve the visibility of the atomically 
thin WSe, film. The red arrows mark the same edge in b and c, which has 
been identified as the zigzag direction using the SHG scan. 
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Extended Data Fig. 2 | Polarization of subcycle sideband emission. 
Circularly polarized 10-fs near-infrared (NIR) pulses (polarization- 
resolved intensity depicted as black spheres) excite valley-polarized 
electron-hole pairs in a monolayer of tungsten diselenide. Simultaneously, 
an atomically strong terahertz wave is applied in the zigzag direction and 
may transfer electrons and holes to the non-excited K’ valley. The high- 
order sideband emission resulting from coherent electron-hole collisions 
driven by the most intense half-cycle is measured to have an elliptical 
polarization (blue spheres), and contains contributions from the opposite 
valley. Our quantum theory reproduces this polarization state (red curve) 
and reveals a transfer yield of 66% to the initially unexcited K’ valley. 
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Optical-frequency synthesizers, which generate frequency- 
stable light from a single microwave-frequency reference, are 
revolutionizing ultrafast science and metrology, but their size, 
power requirement and cost need to be reduced if they are to be 
more widely used. Integrated-photonics microchips can be used 
in high-coherence applications, such as data transmission’, highly 
optimized physical sensors” and harnessing quantum states’, to 
lower cost and increase efficiency and portability. Here we describe a 
method for synthesizing the absolute frequency of a lightwave signal, 
using integrated photonics to create a phase-coherent microwave- 
to-optical link. We use a heterogeneously integrated III-V/silicon 
tunable laser, which is guided by nonlinear frequency combs 
fabricated on separate silicon chips and pumped by off-chip lasers. 
The laser frequency output of our optical-frequency synthesizer 
can be programmed by a microwave clock across 4 terahertz 
near 1,550 nanometres (the telecommunications C-band) with 1 
hertz resolution. Our measurements verify that the output of the 
synthesizer is exceptionally stable across this region (synthesis error 
of 7.7 x 10—}° or below). Any application of an optical-frequency 
source could benefit from the high-precision optical synthesis 
presented here. Leveraging high-volume semiconductor processing 
built around advanced materials could allow such low-cost, low- 
power and compact integrated-photonics devices to be widely used. 

The electronics revolution that began in the mid-twentieth century 
was driven in part by advances related to the synthesis of radio and 
microwave-frequency signals for applications in radar, navigation and 
communications systems. This formed a foundation for more recent 
technologies of wide impact, such as the Global Positioning System 
and cellular communications. Direct-digital synthesis now operates 
at >10 GHz rates with watt-scale power. Despite the ubiquity of elec- 
tronic synthesis, no comparable technology existed for electromagnetic 
signals in the optical domain until the introduction of the self-referenced 
optical-frequency comb*”. An optical-frequency comb can provide the 
critical phase-coherent link between microwave and optical domains, 
with an output consisting of an array of optical modes having frequen- 
cies given exactly by ¥,=N frep tfceo Where frep and fceo are microwave 
frequencies and n is an integer. Over the past two decades, optical- 
frequency synthesizers using mode-locked-laser frequency combs have 
been demonstrated®’. The optical-synthesizer output, derived from a 
reference clock, is invaluable for coherent light detection and ranging’, 
atomic and molecular spectroscopy and optical communications. 
Optical-frequency-comb technology has also matured so that a typi- 
cal erbium-fibre comb system requires approximately 2 W of optical 
pump power’. 

A new opportunity for chip-integrated optical-frequency synthe- 
sis has emerged with development in heterogeneously integrated 
photonics? and photonic-chip microresonator frequency combs, or 
microcombs!*!”, Microresonators pumped by a continuous-wave 


(CW) laser generate a parametric four-wave mixing comb in dielec- 
tric media. Relying on waveguide confinement and high nonlinearity 
of the integrated photonics, microresonators provide a route to comb 
generation with only milliwatts of input power!” and high pump- 
conversion efficiency’®. Precise waveguide group-velocity dispersion 
(GVD) control!’, combined with the realization of low-noise dissipative 
Kerr solitons (DKSs)”"-””, has led to octave-spanning optical spectra 
with dispersive waves”?-?> to enhance the signal-to-noise ratio in 
microcomb carrier-envelope-offset frequency (feo) detection?®**. In 
parallel, through heterogeneous integration it has become possible to 
seamlessly combine active and passive components, such as semicon- 
ductor lasers and amplifiers, electro-optic modulators, passive wave- 
guides, photodiodes and complementary metal-oxide-semiconductor 
(CMOS) electronics on a silicon-chip platform’, and specifically to 
implement phase-locking of integrated lasers to microcombs””*°. Our 
work makes use of Kerr-soliton frequency combs and silicon photonics 
to realize optical-frequency synthesis derived phase-coherently from 
an electronic clock. 

Mirroring the framework of most traditional optical and microwave 
synthesizers, our system is composed of a tunable laser oscillator that 
we phase-lock to a stabilized microcomb reference. Figure 1a presents 
the concept of a future integrated synthesizer, and Fig. 1b indicates the 
connections between the integrated tunable laser and the chip-based 
Kerr-comb components that are used in this work. We use the C-band 
tunability, narrow linewidth and rapid frequency control of a III-V/ 
silicon ring-resonator” laser as the synthesizer output, and the phase- 
coherent microwave-to-optical connection of a fully stabilized DKS 
frequency comb. The DKS dual comb consists of an octave-bandwidth, 
silicon nitride comb with 1 THz mode spacing and a C-band-spanning, 
fused-silica comb with 22 GHz mode spacing. By phase-stabilizing 
both comb spacings (frep,rHz and frep,cHz) and the silicon nitride comb’s 
offset frequency, fcco,rHz» we establish the precise factor of 19,403,904 
phase-coherent multiplication from 10 MHz to the optical domain. 
With this tunable-laser and frequency-comb system, we demonstrate 
synthesis across a 4-THz segment of the C-band by programming and 
dynamically stepping the output frequency; see Fig. 1c-e. As the role of 
any synthesizer is to output a phase-coherently multiplied version of the 
input clock, we characterize the optical synthesizer primarily through 
its fluctuations with an out-of-loop frequency comb derived from the 
same clock. A fully integrated synthesizer, realized by using, improv- 
ing and connecting the chip components that we describe, would be a 
powerful tool for many applications (see Methods). 

To demonstrate the optical-frequency synthesizer, we carry out a 
series of experiments characterizing its output frequency. Standard 
spectrometer or interferometer measurements readily verify system 
performance at the megahertz (or 107) level. By measuring the syn- 
thesizer with an auxiliary self-referenced erbium-fibre comb, we con- 
strain the frequency error between the output and the synthesizer’s 
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Fig. 1 | Accurate optical synthesis with an integrated laser and DKS 
dual-comb system. a, Conceptual integrated optical-frequency synthesizer 
with digital control and f-2f stabilization, using the microcombs and 
tunable laser of this work. PD, photodetector. b, Our optical synthesizer 

is composed of an integrated tunable laser and chip-based Kerr-comb 
generators. Green boxes indicate the tabletop subsystems including 

the chips, and how they connect. The CW pumping laser for some 
experiments is a second integrated laser; see Methods. The tunable laser 

is synthesized by phase-locking to the stabilized combs, using a look-up 
table (LUT) and FPGA. QPSK, quadrature phase shift key modulator. 


setpoint to <1.5 Hz. Beyond demonstration of the integrated- 
photonics architecture, the core result of our work is verification that 
the synthesizer offers sufficient phase control and synchronization in 
microwave-to-optical conversion (as do the auxiliary comb and our 
frequency-counting electronics) to reveal a stable phase correlation 
between the CW output and the radiofrequency (RF) clock. Hence, the 
statistical fluctuations that lead to the synthesizer’s instability, and our 
measurement of these, offer the complete description of the synthesizer’s 
frequency performance. 

The chip-based integrated components of the synthesizer—the tuna- 
ble laser (Fig. 1f) and DKS frequency combs (Fig. 1g and h)—and their 
key connections with non-integrated components are emphasized in 
Fig. 1b. An external cavity pump laser is used to generate both of the 
DKS combs, using independent control with single-sideband frequency 
shifters and erbium amplification for each comb. An octave-spanning 
single-pulse soliton is generated in a Si3N4 planar waveguide-coupled 
resonator. In addition to the anomalous GVD profile, waveguide- 
dispersion engineering creates dispersive-wave peaks in optical power 
that appear at 999 nm and 2,190 nm, owing to the zero-integrated GVD 
starting from the pump wavelength. With a radius of 23 jm, the threshold 
for octave-spanning spectra is brought to below 50 mW of on-chip 
pump power”, at the expense of a frep,rHz of 1.014 THz that cannot 
be easily photodetected and reduced to a microwave frequency with 
conventional electronics. Rather, we rely on a second frequency comb 
to bridge the gap between Si3N, THz comb modes. 
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c, Optical spectra of the laser across 32 nm. d, e, Measurements of the 
synthesizer output as it is stepped. The data indicate the deviation between 
the synthesizer output 1, and its setpoint for mode-hopping across 

the 22-GHz SiO modes (d) and for application of precise frequency 

steps of 15.36 Hz (e). f, Scanning electron microscope (SEM) image of 

the heterogeneous II-V/Si tunable laser with false colour electrodes 
(yellow) and waveguides (blue). g, Photograph of the SiO2-based wedge 
microresonator. h, SEM image of the Si;N4 THz resonator with false colour 
imposed on the waveguide regions. 


To do this, an SiO. wedge-based whispering-gallery-mode reso- 
nator with a quality factor (Q) of 180 million is used to create a DKS 
frequency comb at frep,Gz © 22 GHz (ref. ”*). As the threshold for 
soliton-comb generation scales inversely to both the repetition rate 
and Q’, use of an SiO, device is important for low-power operation. 
The repetition frequency of 22 GHz is photodetected and phase- 
locked to the RF clock. This first step in the microwave-to-optical 
frequency chain (Fig. 2a) from fa, = 10 MHz to 22 GHz partially 
stabilizes the SiO, reference comb to guide tunable laser synthesis; 
see Fig. 2b. The second step is detection of the 1.014THz frequency 
spacing between Si3;N4 comb teeth, which we accomplish using 
the 46th relative comb line from the SiO, comb. Operationally, we 
measure frep,rHz by detecting the optical heterodyne beat note between 
the two combs 1 THz away from the pump. We phase-lock this signal 
to a synthesized radiofrequency, f, =a fax (where a is the ratio of 
two integers), after removing the relative contributions from the 
single-sideband frequency shifters and feeding back to the frequency 
of the Si;N4 pump laser*!**. Thus, we stabilize Srep,THz and transfer 
the f4, stability to 1.014 THz. The frequency of each of the SisN, THz 
comb lines with negative offset frequency and mode number N = 192 
is then given by: 


VTH2 = Nf ep.THz —feco,tHte 


(1) 
Y'THzpump = 192 (46 o.GHe a of.) = es 
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Fig. 2 | Optical spectra of the integrated devices. a, Schematic diagram of 
spectral combination with the integrated devices, and the frequency chain 
used to multiply the 10-MHz clock to the optical domain. b, Combined 
spectrum of the SiO; 22-GHz wedge microcomb and the heterogeneously 


Next, fceo, THz locking is achieved by using the octave-spanning rela- 
tionship of the THz lines at 1,998 nm and dispersive wave peak at 
999 nm (Fig. 2c). To aid f-2f self-referencing (which enables determi- 
nation of the absolute frequency of each comb line), an independent 
diode laser and thulium-doped fibre amplifier at 1,998 nm supply 9 mW 
to a waveguide periodically poled lithium niobate (PPLN) device to 
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Fig. 3 | Stable optical synthesis with out-of-loop verification. a, Tunable 
laser locking, and frequency counting with the auxiliary comb. HNLF, 
highly nonlinear fibre. b, Measured overlapping Allan deviation (ADEV) 
and modified Allan deviation (MDEV) of the frequency synthesizer. 

In comparing 10-ms counter-gate time acquisitions, the 1/7 slope is 
consistent with a stable, phase-locked synthesizer, and the histograms 

of 500s of data (inset for relative mode m =—28 only) show a Gaussian 
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integrated III-V/Si tunable laser in the telecommunication C-band. 
c, Combined spectrum of the octave-spanning Si;N4 THz microcomb and 


the 22-GHz SiO2 wedge microcomb, as measured on two optical spectrum 
analysers. 


achieve 34 dB signal-to-noise ratio (SNR) on feeo,rHz- Similar monolithic 
second-harmonic generation and amplifier technologies have been 
demonstrated and could be integrated with our system (see Methods). 
After detecting two heterodyne beats with the THz comb, fo99 and fi99g; 
each beat note is digitally divided by 64 and 32, respectively, and fre- 
quency mixing yields an fro rH, signal, fceo,rHz/64 = fo99/64 — fi99/32. 
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profile. Error bars indicating 95% confidence intervals are derived using 
flicker noise estimates (see Methods). c, Table of nominal frequencies 
and uncertainty at 200s as the synthesizer is stepped across the C-band. 
d, Overview of the accuracy and precision of the synthesizer frequency. 
The ADEV at 100s is used to estimate the uncertainty of each synthesizer 
output, and the weighted mean of the seven data points is reported with a 
95% (t distribution) confidence interval. 
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Fig. 4 | Arbitrary control of the optical-frequency synthesizer. a, Step- 
wise control of the tunable-laser offset phase-lock to the DKS comb and 
frequency counting. b, Deviation between the synthesizer output Vo, and 
constant setpoint for 500s at a 10-ms gate time. c, Bidirectional linear 
ramp of the synthesizer via step control of the laser offset PLL setpoint 


Phase-locking this signal to a radiofrequency f2 = 3 fax, through feed- 
back to the Si3N4 pump power, completes the transfer of stability from 
fax to all the THz comb lines spanning 130 THz to 300 THz. 

The dual-stabilized combs serve as the backbone to guide the heter- 
ogeneously integrated III-V/Si tunable laser for arbitrary optical- 
frequency synthesis across the C-band. The tunable laser consists of 
InGaAsP multiple-quantum-well epitaxial material that is wafer- 
bonded onto a lithographically patterned silicon-on-insulator wafer?. 
Bias heaters integrated on the laser’s Si-based resonant reflectors and 
phase section are used to shift the lasing wavelength for initial align- 
ment to the comb lines. By using Si waveguides that have low loss rel- 
ative to standard telecommunication-grade InP waveguide technology, 
reduced linewidths of about 300 kHz are achieved. The combined opti- 
cal spectrum of the SiOz comb and integrated laser’s tuning range is 
shown in Fig. 2b. Heterodyning with the DKS dual-comb signal at a 
relative mode m from the pump creates a signal, |e for input to a 
field-programmable gate array (FPGA)-based phased-locked-loop 
(PLL, Fig. 3a) with a local oscillator of fs =7 fax, and digital division of 
512. This final laser lock to the DKS dual comb produces a fully stabi- 
lized, tunable synthesizer output, 


_ laser 
Mout = VTHz,pump + Mf eo, GHn TFica 


(2) 
=f ,, [192 (46 x 2,197 + a) + 2,197m—648 + 5127] 


This expression shows that the output of our integrated-photonics 
synthesizer is uniquely and precisely defined relative to the input 
clock frequency in terms of user-chosen integers and ratios of integers 
(a, 8,7). 

Agile tuning across SiO, comb lines (varying m) and hertz-level tun- 
ing resolution on the same comb line (varying ) have already been 
presented in Fig. 2b and Fig. 1c, demonstrating synthesizer operation. 
To explore our synthesizer’s phase coherence, we perform an out-of- 
loop optical-frequency characterization by heterodyning vy, against 
an auxiliary erbium-fibre laser frequency comb that is fully stabilized 
to the same fq,. Figure 3 shows results from a study of the tunability and 
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(100-ms gate). d, Arbitrary frequency control of the synthesizer across 
40 frequency setpoints to write “NIST”. A 30-ms gate time is used to 
oversample each frequency setpoint by 5 (150-ms pause per point), and 
every fifth data point is displayed. 


phase-locked operation of the synthesizer across all comb frequencies 
by locking to five adjacent SiOz comb lines, and to the highest and low- 
est wavelengths of the laser tuning range. Overlapping Allan deviation 
(ADEV) analysis of the counted beat notes against the auxiliary comb 
show the instability improving as <10~"*/ for all recorded averaging 
times 7, and reaching an average instability of (4.2 + 0.4) x 107! at 
200 s (Fig. 3b, c). More sophisticated triangular averaging analysis using 
the modified Allan deviation (MDEV) yields an order of magnitude 
better instability of (9.2+1.4) x 107 at 1s. Still, the 1/7 dependence 
of the ADEV data, which characterizes the fluctuations of the optical- 
frequency synthesizer, indicates the stable phase relationship between 
the RF clock and the synthesized optical frequencies. Moreover, the 
synthesizer performance is consistent with the hydrogen-maser RF 
clock used in the experiments, indicating that our phase locks of the 
tunable laser, the Kerr combs and the auxiliary erbium-fibre comb 
contribute negligible noise. This is the most fundamental metric of an 
optical synthesizer. From the mean values of the measured beats with 
the auxiliary comb, we can further analyse potential deviations of the 
synthesizer output from equation (2). Data compiled from the seven 
experiments are shown in Fig. 3d with 100-s ADEV error bars plotted, 
and the weighted mean of all data sets with a 95% confidence interval 
is (1.21.5) Hz. Thus, based on these initial data, we conclude that 
our integrated-photonics optical synthesizer accurately reproduces the 
input clock reference within an uncertainty of 7.7 x 1071, competitive 
with commercial optical synthesizers (5 x 107! instability at 1s and 
accurate to 10~'* at 120s). 

To demonstrate the tunability of the optical-frequency synthesizer, 
we perform two different types of tuning while the laser is locked to 
the stabilized comb system (Fig. 4). As a baseline, without changing the 
setpoint of the tunable laser phase-lock, the raw data of the counted 
auxiliary comb beat note are shown in Fig. 4b after subtraction of the 
nominally expected frequency for 500s. We then apply a bidirectional 
linear ramp over eight levels with a 2-s pause at each level to ensure 
successful locking (Fig. 4c). Finally, we programme a series of setpoint 
frequencies to the FPGA PLL box to write out the National Institute of 
Standards and Technology (NIST) logo (Fig. 4d). Excellent agreement 
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is found between the expected offset frequencies and the counted beat- 
note frequencies for all cases, illustrating good dynamic control of the 
synthesizer. 

In summary, the experiments that we present, performed with an 
optical-frequency synthesizer constructed from integrated photonics, 
demonstrate that this technology has achieved the high precision and 
accuracy that formerly has been confined to tabletop mode-locked 
laser frequency-comb devices. For further integration of the laser and 
Kerr combs used in our experiments, targeted improvements should 
be made to increase microresonator Q for lower-power operation, to 
improve the intensity of the Si;N4 comb dispersive waves for f—2 f stabi- 
lization, and to improve the efficiency of second-harmonic generation, 
guided by the applications that are envisaged for the device. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
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METHODS 


Device and experimental details. The heterogeneously integrated III-V/Si device 
includes a tunable laser and a semiconductor optical amplifier (SOA). At room 
temperature, the laser emits up to 4mW CW power, and the SOA provides an 
on-chip small-signal gain >10 dB. The laser contains a gain section, a phase section 
and two microresonators designed for high quality factor. The gain section and 
the SOA consist of electrically pumped InP-based quantum wells heterogeneously 
integrated on a Si waveguide’. Thermal heating of the Si microresonators and pas- 
sive phase section is performed with current injection to metal heaters above the 
waveguides. By intentionally mismatching the radii of the microresonators to make 
use of the Vernier effect, we can use a narrowband intracavity optical filter to select 
the desired longitudinal mode for lasing with high side-mode suppression ratio*’. 
Precise wavelength tuning and linewidth narrowing is performed by heating the 
phase section (Extended Data Fig. 1). Phase-locking of the laser to the microcomb 
is performed by electronically dividing the beat note by 512 and using FPGA-based 
digital PLL+ PI’D feedback (that is, a proportional-integral-derivative controller 
with two-stage integration) to the gain section of the laser®. Other works have also 
demonstrated high-bandwidth optical-PLL phase locks to frequency combs””*°. 
In the current system, the tunable laser can lock to either the SiO, or SisN4 comb 
lines, which we have shown for m= —138, or 3 THz red of the pump laser. The 
DC linear tuning coefficient of the tunable laser is approximately 200 MHzmA!. 

A commercial external cavity diode laser is used as the shared pump for both 
microresonator comb generators in all the synthesizer experiments. The out- 
put of a 3-dB splitter goes to separate LiNbO; single-sideband modulators and 
erbium-doped fibre amplifiers for each device. Frequency detuning from each 
microcomb resonance for soliton generation is controlled with an amplified voltage- 
controlled oscillator and arbitrary waveform generator that produces a voltage 
ramp. Although complex soliton crystals!* can form in these devices, single solitons 
are generated through linear voltage ramps of 5GHz in 100 ns and 100 MHz in 3 
\ts, for the SisN4 and SiO microcombs, respectively**. Once initiated, feedback to 
each voltage-controlled oscillator controls frep,cz and frep,rHz for the appropriate 
device. Intensity modulation on the Si3N4 microcomb to control fceo,rHz is per- 
formed with a free-space acousto-optic modulator, although on-chip SOAs are 
expected to be viable as well. Lensed fibres with a 2.5-j1m spot size are used to 
couple light on and off the Si3N, chip with 7 dB of insertion loss per facet. During 
operation, the on-chip pump power for the Siz3N4 microcomb is about 160 mW, or 
ten times the threshold for soliton generation. Tapered single-mode fibre is used to 
couple 80 mW to the SiO; microcomb for soliton generation at 12 times the soliton 
threshold. Recent results show that this platform can be integrated with Si;N, bus 
waveguides*’. An offset Pound—Drever-Hall lock is required after ramping to keep 
the SiO, pump frequency at 22 MHz red detuned from resonance”. 

During operation of the optical-frequency synthesizer, the separate single- 
sideband modulators for each microcomb device create a detectable offset in pump 
frequencies, about 5 GHz in our experiment. This is readily subtracted from or 
added to the necessary heterodyne beat notes in the system using an electronic 
frequency mixer, specifically after frep,rH, detection between comb lines and after 
the III-V/Si laser heterodyne with the DKS comb. The calibrated gain sign of the 
tunable-laser feedback loop ensures that the tunable laser is on the appropriate 
side of the SiO, comb modes when electronically subtracting or adding this offset, 
and knowledge of the absolute difference in pump frequencies is not required for 
accurate optical-frequency synthesis. We observe non-zero synthesis error when 
the SNR of any heterodyne beat falls well below the optimal level of 30 dB, but 
measurements reported here were acquired with sufficient SNR. We also observe 
and minimize contributions from out-of-loop optical and electrical path lengths, 
alignment drift, and glitches during long acquisitions. The RF synthesis and 
phase-locking electronics used in the experiments are benchtop scale, but in the 
future would make use of CMOS integration®”. 

Auxiliary comb details and frequency counting. The auxiliary comb used for 
out-of-loop verification of the optical-frequency synthesizer consists of a 250-MHz 
erbium-fibre mode-locked laser frequency comb”®. The laser output is amplified 
and spectrally broadened to an octave to enable self-referenced detection of the 


carrier envelope offset frequency, fceo. The fourth harmonic of frep is phase-locked 
to a reference synthesizer at 999.999 544 MHz, and fceo is electronically divided by 8 
and phase-locked to another synthesizer at 20 MHz. Both of these synthesizers are 
referenced to the same f.. that is the input to the integrated-photonics synthesizer, 
yielding a comb against which any frequency of the microcomb or tunable laser 
output can be compared. 

The beat-note frequency between the integrated-photonics synthesizer and 
the erbium-fibre frequency comb is amplified and bandpass-filtered (45 MHz 
bandwidth), after which a zero-dead-time frequency counter registers the fre- 
quency for a fixed gate time. The rectangular binning, or II-mode, is used during 
measurement and for the ADEV analysis. The MDEV analysis applies a triangular 
averaging window to the frequency data for further information on the noise type. 
With this analysis, a 7~>? slope shows the desired white phase noise performance, 
and deviation from this slope reveals unwanted flicker phase noise contributes to 
system performance at longer averaging times. Because the degrees of freedom 
depend on noise type, we take the conservative estimate of flicker phase noise to 
derive 95% confidence intervals*’. The tunable laser PLL also contains an in-loop 
frequency counter, which showed tight phase-locking of the laser to the microcomb 
at <10~13/r, limited by the resolution of the counter. All RF synthesizers in the 
experimental set-up, auxiliary comb and frequency counter are tied to the same 
hydrogen maser signal, serving as fax. 

Perspectives and future work. A critical element to operation of the optical 
synthesizer is the pump laser of the DKS microcombs. We show that the same 
III-V/Si tunable laser from this work can be used to generate low-noise solitons 
in the Siz3N4 microresonator; see Extended Data Fig. 2. Further development is 
required to stabilize solitons in the high-Q SiO, microresonator with the III-V/Si 
tunable laser, although we observe modulation instability (non-soliton) Kerr 
combs and their transient decay through the Kerr soliton stability regime. Further 
technical improvement of the III-V/Si tunable laser would probably permit soliton 
stabilization. At present, we require an optical power of 80 mW for the SiO. comb, 
160 mW on-chip for the Si;Ny comb and 9 mW on-chip for the PPLN device. In 
each case, we anticipate improving the chip-device performance to be compatible 
with available integrated-laser power levels to support further integration of our 
frequency synthesizer. In future implementations of our optical-frequency syn- 
thesizer, technical improvements such as improved on and off chip coupling, long 
wavelength SOAs“ and higher efficiency second-harmonic generation‘! would 
make the 1,998-nm diode laser unnecessary. 

Data availability. The data sets generated and/or analysed during the current study 
are available from the corresponding authors on reasonable request. 
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Extended Data Fig. 1 | Tuning details for III-V/Si laser. a, Typical tuning Careful control of the heater is required to reach all wavelengths in the 
map of the III-V/Si tunable laser’s peak wavelength in nanometres versus tuning range, and reduction of the laser linewidth (blue to red) through 
current applied to each heater above the ring resonators. b, Normalized longitudinal mode alignment and the optical feedback effect is required 
optical spectra showing >40 dB of side-mode suppression ratio across to achieve the best phase-locking performance to the microcombs. RBW, 
the tuning range. c, Typical unlocked RF beat notes between the tunable resolution bandwidth, VBW, video bandwidth. 

laser and the auxiliary comb for two different biases of the phase section. 
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Extended Data Fig. 2 | Demonstration of pumping the Si;N, THz 
microcomb with the III-V/Si laser. a, Output optical spectrum of the 
THz microcomb showing dual-dispersive waves, as measured on two 
optical spectrum analysers. b, Comparison of electro-optic repetition rate 
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detection*’ when using the same III-V/Si laser (black) and external cavity 
diode laser (ECDL, red) from the main experiment to pump the THz 
microcomb. 
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Hyperexpandable, self-healing macromolecular 
crystals with integrated polymer networks 


Ling Zhang!, Jake B. Bailey!, Rohit H. Subramanian! & F. Akif Tezcan!?* 


The formation of condensed matter typically involves a trade- 
off between structural order and flexibility. As the extent and 
directionality of interactions between atomic or molecular 
components increase, materials generally become more ordered but 
less compliant, and vice versa. Nevertheless, high levels of structural 
order and flexibility are not necessarily mutually exclusive; there are 
many biological (such as microtubules!”, flagella’, viruses**) and 
synthetic assemblies (for example, dynamic molecular crystals®? 
and frameworks!°-!5) that can undergo considerable structural 
transformations without losing their crystalline order and that have 
remarkable mechanical properties®!*"> that are useful in diverse 
applications, such as selective sorption’®, separation’’, sensing!® 
and mechanoactuation!’. However, the extent of structural changes 
and the elasticity of such flexible crystals are constrained by the 
necessity to maintain a continuous network of bonding interactions 
between the constituents of the lattice. Consequently, even the 
most dynamic porous materials tend to be brittle and isolated as 
microcrystalline powders", whereas flexible organic or inorganic 
molecular crystals cannot expand without fracturing. Owing 
to their rigidity, crystalline materials rarely display self-healing 
behaviour”®. Here we report that macromolecular ferritin crystals 
with integrated hydrogel polymers can isotropically expand to 180 
per cent of their original dimensions and more than 500 per cent 
of their original volume while retaining periodic order and faceted 
Wulff morphologies. Even after the separation of neighbouring 
ferritin molecules by 50 angstréms upon lattice expansion, specific 
molecular contacts between them can be reformed upon lattice 
contraction, resulting in the recovery of atomic-level periodicity and 
the highest-resolution ferritin structure reported so far. Dynamic 
bonding interactions between the hydrogel network and the ferritin 
molecules endow the crystals with the ability to resist fragmentation 
and self-heal efficiently, whereas the chemical tailorability of 
the ferritin molecules enables the creation of chemically and 
mechanically differentiated domains within single crystals. 
Hydrogel polymers present a stark contrast to molecular crystals in 
that they lack structural order, but are highly elastic and adaptive, can 
expand considerably and self-heal when equipped with dynamic bond- 
ing functionalities”!”*. Previously, the isotropic swelling-contraction 
behaviour of hydrogels has been used to modulate the lattice spacing 
of colloidal nanoparticle arrays”, and recently, to expand biological 
tissue samples and thus facilitate high-resolution fluorescence imag- 
ing”. In this study, we examine whether the mechanical properties 
of hydrogels could be endowed upon molecular crystals. That is, can 
crystal lattices that are formed by discrete molecules that are connected 
via specific bonding interactions be mechanically modulated through 
the integration of polymeric hydrogels? To create hydrogel-expandable 
molecular crystals, we surmised that the following design parameter 
conditions should be met: (1) lattices should be mesoporous to enable 
the hydrogel network to penetrate efficiently and uniformly into the 
crystals; (2) intermolecular interactions between the constituents of 
the lattices should be reversible and chemically specific (that is, contain 


directional and dynamic bonds), such that they disengage with ease 
during expansion and re-engage with high fidelity upon contraction; 
(3) interactions between the constituents of the lattice and the hydrogel 
network should be extensive to maintain the integrity of the crystal- 
polymer hybrid at all times and sufficiently dynamic to minimize the 
build-up of local strain and to enable self-healing. 

With these parameters in mind, we arrived at hybrid materials 
composed of ferritin crystals integrated with the superabsorbent 
poly(acrylate-acrylamide), or p(Ac-Am), copolymer hydrogels, 
whose swelling—contraction behaviour can be modulated by the ionic 
strength and pH”. Ferritin is a 24-meric, quasi-spherical protein with 
432 symmetry, an outer diameter of 12 nm, an inner diameter of 8nm, 
and a molecular weight”® of more than 500,000 Da. Human heavy- 
chain ferritin forms highly ordered, face-centred cubic (fcc) crystals 
that routinely grow to more than 200 1m in size and diffract to less than 
2.0A. The fcc lattice (Fig. 1a) is characterized by a mesoporous net- 
work consisting of cube-shaped, 6-nm-wide chambers (Fig. 1b) that are 
interconnected by smaller, octahedron-shaped cavities that taper to a 
pore size of about 2 nm at their narrowest (Fig. 1c), thus fulfilling condi- 
tion (1). The lattice is formed through highly specific, metal-mediated 
contacts between neighbouring ferritin molecules (Fig. 1d), which are 
promoted through the K86Q surface mutation to enable metal coordi- 
nation”’. The absence of any other interprotein contacts means that the 
entire lattice bonding framework of ferritin molecules can be formed 
or broken via binding or removal of metal ions (such as Ca”"), satisfy- 
ing condition (2). Finally, ferritin bears a small negative charge, with a 
zeta potential ranging from —5.5 mV at pH 6.0 to —7.3 mV at pH 7.5 
(Extended Data Fig. 1a, b). The exterior surface of ferritin presents a 
diffuse distribution of both negatively and positively charged residues 
(Extended Data Fig. 1c), which should enable uniform association with 
the p(Ac-Am) network through a combination of ionic and H-bonding 
interactions, thus fulfilling condition (3) (Fig. le). 

We first examined the efficiency of molecular diffusion and polymer- 
ization within ferritin crystals. Diffusion into single ferritin crystals was 
assessed using the fluorescent tracer rhodamine B by confocal fluores- 
cence microscopy experiments. These experiments showed that a typical 
crystal (edge length, ledge = 50-250 1m) was completely infiltrated by 
rhodamine B (Extended Data Fig. 2a), which is considerably larger 
(479 g mol~!) than the Ac and Am molecules (both 71 gmol- 1), within 
15 min. In a typical preparation of crystal-hydrogel hybrids, ferritin 
crystals were incubated with polymer precursors (8.625% (w/v) sodium 
acrylate, 2.5% acrylamide and 0.2% N,N’-methylenebis(acrylamide)) 
for at least 10h to ensure their uniform distribution in the lattice inter- 
stices. This treatment caused no apparent damage to the crystals (see 
Supplementary Information for quantification of polymer precursor 
concentrations inside the crystals). Crystals were then transferred into 
a solution containing 1% (w/v) ammonium persulfate (APS) and 1% 
(v/v) tetramethylethylenediamine (TEMED) to initiate free-radical 
polymerization within the lattice, as well as 4M sodium chloride 
(NaC]) to limit swelling during polymerization (Fig. le). To assess the 
kinetics of polymerization inside the crystals, we added 0.3% (w/v) 
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Fig. 1 | Packing arrangement in ferritin crystals and their expansion- 
contraction mediated by the infused hydrogel network. a-c, The fcc 
packing arrangement of ferritin crystals (Protein Data Bank identifier, 
PDB ID, 6B8F). The unit cell, the 200 plane and the 111 plane are outlined 
in black, green and red, respectively. d, Ca-mediated intermolecular 
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8-hydroxypyrene-1,3,6-trisulfonic acid (pyranine) to the aforemen- 
tioned co-monomer mixture. Pyranine has been reported to become 
covalently incorporated into the polymer backbone upon radical- 
mediated crosslinking and undergo a shift”* in its emission maximum 
from 512 to 420nm. Thus, the extent of in crystallo polymerization 
could be monitored through the decrease of green fluorescence inten- 
sity (emission wavelength Aemission = 500-550 nm, excitation wave- 
length Aexcitation = 488 nm), indicating that hydrogel formation was 
complete in less than 2 min for a crystal with legge = 70 jum (Extended 
Data Fig. 2b, c and Supplementary Video 1; see Extended Data Fig. 3 
for polymer quantification via °F nuclear magnetic resonance, NMR). 
Polymerization was promptly followed by intrusion of the aqueous 
NaCl solution into the crystal-hydrogel matrix, which was clearly 
visualized owing to the difference between the refractive indices of 
the salt solution (np = 1.3676) and the matrix (np ~ 1.34) (Extended 
Data Fig. 2b and Supplementary Videos 1, 2). The solvent permeation 
process typically finished within 10 min and was accompanied by a 
small but noticeable enlargement of the crystals (<5% increase in edge 
length) (Extended Data Fig. 2c). 

Full expansion of hydrogel-infused ferritin crystals was initiated by 
placing them in deionized water. As observed using light microscopy, 
the expansion of the crystals was highly isotropic and their sharply 
faceted, polyhedral morphologies were preserved even after they grew 
to > 210% of their original dimensions (Fig. 2a and Supplementary 
Video 2; see Extended Data Fig. 4a and b for additional examples), 
often without the appearance of any defects. The expansion kinetics 
was biphasic, with time constants Tfyst << 100 s and Tglow >> 300 s (Fig. 2a). 
Isotropic growth continued indefinitely, until the edges of the mate- 
rials were not discernible, but we typically stopped the process after 
<10 min, when considerable expansion had already occurred. No 
substantial release of ferritin molecules from the lattices was evident 
during the first 50 min of expansion (Extended Data Fig. 4c). Addition 
of a concentrated monovalent salt solution (NaCl or KCl) led to rapid 
dehydration and isotropic contraction of the expanded crystals to 
nearly their original size (Fig. 2a and Extended Data Fig. 5a). Recovery 


LETTER 


interactions between ferritin molecules in the lattice. Ca?* ions (blue) 
are coordinated by two pairs of D84 and Q86 side-chains (magenta). 

e, Schematic representation of the formation, expansion and contraction 
of ferritin crystal-hydrogel hybrids. 


of the original crystal dimensions could be achieved by further addi- 
tion of CaCl, owing to the ability of Ca?+ to both screen the nega- 
tively charged polymer backbone more effectively and to re-engage 
specific interactions between ferritin molecules. The same effect was 
observed with other divalent metal-ion salts (Extended Data Fig. 5b). 
The expansion-contraction cycle could be repeated at least eight times 
without apparent loss in amplitude and change in crystal morphology 
when a monovalent metal-salt solution was used to induce contrac- 
tion (Extended Data Fig. 6). We observed that crystals contracted 
with CaCl, displayed considerably smaller expansion owing to the 
enhanced strength of the polymer network and protein-protein inter- 
actions. In control experiments, we examined other hydrogel formula- 
tions, including hydrogels that only contained polar but non-charged 
(pAm or poly-tris(hydroxymethyl)methyl(acrylamide)) or non-polar 
(poly-N-isopropylacrylamide) side-chains (Extended Data Fig. 7). All 
of these polymers led to either dissolution or disintegration of crystals 
after initiation of in crystallo polymerization, suggesting a lack of sub- 
stantial interactions between the functional groups on these polymers 
and on the ferritin surface. 

Interestingly, pAc hydrogels promoted isotropic expansion of the 
crystals in the absence of Am co-monomers (Extended Data Fig. 7), 
indicating that carboxylate side-chains are the primary mediators of 
interactions with ferritin molecules. By contrast, treatment of ferritin 
crystals with pre-formed pAc polymers, which cannot diffuse into the 
lattice, led to crystal dissolution upon transfer into water (Extended 
Data Fig. 7c). Together, these observations confirm that (i) there are 
extensive non-covalent interactions between ferritin molecules and 
the p(Ac—Am) hydrogel matrix that preserve the structural integrity 
of even highly expanded crystals, and (ii) the hydrogel matrix con- 
tinuously and uniformly pervades the entire lattice, thus promoting 
cooperative transmission of any lattice deformations to enable isotropic 
expansion—contraction. 

We investigated the expansion-related changes in the lattice arrange- 
ment of ferritin molecules using small-angle X-ray scattering (SAXS). 
Initial experiments entailed bulk measurements of a large number 
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Fig. 2 | Characterization of the expansion and contraction behaviour 

of ferritin crystal-hydrogel hybrids. a, Structural evolution of a 

ferritin crystal-hydrogel hybrid during the polymerization—-expansion- 
contraction process. Black arrows indicate the addition of different 
solutions or water to the crystal. The numbered images (i-vi) in the top 
panels correspond to the selected time points shown as red circles in 

the bottom panel. The separation between the major ticks of the ruler is 
100 1m. b, SAXS profiles of hydrogel-infused ferritin crystals during lattice 
expansion, plotted against the scattering vector length q. The progression 


of ferritin crystals suspended in a capillary tube. Figure 2b shows 
the evolution of the ‘powder’ SAXS pattern of more than 100 single 
p(Ac-Am)-infused crystals upon the initiation of polymerization 
though the addition of APS/TEMED in a solution that contains no 
salt; thus, polymerization is immediately followed by expansion 
(see Methods for experimental details). The spectrum of the unex- 
panded crystals is indicative of an fcc lattice with a unit-cell parameter 
of a=182.40A. The isotropic growth of the unit cell is evident from 
the correlated shifts of the Bragg peaks to lower angles. The decay of 
the higher-angle peaks is considerably more rapid and is accompanied 
by the emergence of the ferritin form factor. This is consistent with the 
picture that as the crystal expands, the hydrogel matrix becomes less 
dense around the ferritin molecules, leading to their increased mobility. 
However, the (111) reflection is still evident after 20 min of expansion, 
which means that some long-range periodic order is still present when 
the unit cell has grown to a= 325 A (Fig. 2c) and the volume of the 
material has increased to 570% of its original value. 

To probe the reversibility of lattice expansion, we set up a micro- 
fluidic flow cell for single-crystal SAXS experiments (Extended 
Data Fig. 8), which circumvent the inherent issues associated with 
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of scattering peaks to lower angles is indicated with grey dashed lines. 
Peaks corresponding to the original lattice parameters (designated with 
red asterisks) are visible throughout the process. c, Changes in the unit-cell 
parameter a during lattice expansion, calculated from the SAXS profiles 
shown in b. The schematics correspond to the red circles and are drawn 

to scale. The error bars were determined from the full-widths at half- 
maximum of the scattering peaks. d, Expansion and contraction of a single 
crystal, monitored using SAXS. 


bulk measurements in a small capillary tube (such as sample heter- 
ogeneity and inefficient solvent diffusion). The SAXS data in Fig. 2d 
indicate that a single-crystal lattice that has expanded by 27%— 
corresponding to a separation of 35 A between neighbouring fer- 
ritin molecules—can return to its original dimensions upon NaCl/ 
CaCl,-induced contraction. To examine whether this recovery also 
occurs at the level of atomic periodicity, we conducted high-angle, 
single-crystal X-ray diffraction (XRD) experiments at room tem- 
perature (Fig. 3a—c). These experiments showed that crystals that 
expanded by up to 40% could fully regain their native diffraction 
pattern upon contraction with divalent metal-ion salts (Extended 
Data Fig. 5b). With such expanded and Ca-contracted crystals, 
we consistently obtained datasets with resolutions <1.15A ata 
synchrotron source at 100 K (Extended Data Table 1). Interestingly, 
the resulting crystal structures revealed two different conforma- 
tional states of the Ca”*-bridged ferritin—ferritin interfaces (Fig. 3d): 
about 60% of these interfaces were found in the native configura- 
tion (as shown in Fig. 1d, but with a well resolved Ca-coordinated 
water molecule), whereas the remaining 40% presented an alterna- 
tive coordination mode for Ca**, probably stemming from lattice 
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Fig. 3 | Atomic-level structural characterization of ferritin crystal- 
hydrogel hybrids by XRD. a-c, XRD patterns (at temperature 
T=293K) ofa ferritin crystal infused with polymer precursors (a), after 
polymerization and expansion (b) and after contraction with CaCl, (c). 
Light micrographs of the crystal are shown in the insets; the separation 
between the major ticks of the ruler is 100m. d, 1.06-A-resolution 


rearrangements during contraction. Notably, the 1.06-A-resolution 
crystal structure (R-factors, Rwork = 9.10%; Riree = 10.26%; estimated 
coordinate error (dispersion precision indicator), 0.011 A) is the 
highest-resolution ferritin structure reported until now. Our findings 
suggest that hydrogel infusion and the expansion-contraction process 
do not diminish XRD data quality and may actually improve it. 

Any local anisotropy developed during the expansion or contraction 
of the hydrogel matrix would be expected to cause dislocations in 
the embedded ferritin lattice. Indeed, exposure of hydrogel-infused 
crystals to rapid changes or temporary spatial gradients in NaCl or 
CaCl, concentrations frequently led to fracturing (see, for example, 
Fig. 3b inset). However, these materials showed a remarkable abil- 
ity to self-heal, whereby the cracks were spontaneously and, in some 
cases, scarlessly sealed (Fig. 4a, b and Supplementary Video 3), owing 
to the reversible bonding interactions of the hydrogel network with 
the protein molecules (Fig. 4c). It is important to note that covalently 
crosslinked hydrogels like p(Ac-Am) do not typically self-heal unless 
they are modified with dynamic bonding functionalities”*”’. In the 
case of our materials, the role of such functional groups is fulfilled 
by the ferritin molecules, which act as interaction hubs for polymer 
chains. During expansion—contraction cycles, cracks tended to reoccur 
in the same loci in a given crystal (Supplementary Video 3). This 
observation suggests that the healed interfaces had not fully regained 
the original hydrogel crosslinking density of the bulk material, at 
least in the time scale (several minutes) of the experiments. Hydrogel 
integration substantially mitigated the brittleness of native ferritin 
crystals (Extended Data Fig. 9a). We observed no fragmentation, 
even in cases of substantial fracturing that propagated throughout 
the crystals, and fissures as wide as 201m could be closed to recover 
near-native crystal morphology (Fig. 4b). The ferritin crystal-hydrogel 
hybrids had a reduced modulus of about 1 GPa, which is similar to 
that of ferritin crystals (Extended Data Fig. 9b), but several orders 
of magnitude higher than those of hydrogels*”. The hybrids are also 
highly thermostable, maintaining their crystalline order at >80°C 
(Extended Data Fig. 9c). 

Owing to the inherent chemical tailorability of ferritin molecules, 
the crystal-hydrogel hybrids could be functionalized in different 


structure (T= 100K; PDB ID, 6B8F) of the contracted ferritin 
crystal-hydrogel hybrid, showing the electron density surrounding the 
Ca-mediated ferritin—ferritin interfaces and highlighting the two observed 
Ca coordination conformations. The electron density (2F,—F.) map (grey) 
is contoured at 1.50. Water molecules and Ca ions are shown as red and 
blue spheres, respectively. 


ways. They could be constructed from ferritin molecules with miner- 
alized ferrihydrite in their interior cavity (Extended Data Fig. 9d and 
Supplementary Video 4), thus exploiting ferritin’s native function as a 
ferroxidase, or with fluorescent tags covalently attached to their exte- 
rior (Fig. 4d, e and Supplementary Video 4). Additionally, spatially 
differentiated, core-shell crystals were created using a layer-by-layer 
growth method (Fig. 4d, e). When infused with p(Ac-Am), such nan- 
oparticle- or fluorophore-functionalized lattices displayed the same 
isotropic expansion-contraction behaviour as non-functionalized 
ones. The layer-by-layer growth process was further modified 
whereby the core lattice domain (labelled with rhodamine groups) 
was first covalently fixed through the chemical crosslinking of 
ferritin molecules with glutaraldehyde, followed by the growth of an 
uncrosslinked, unlabelled shell layer and the incorporation of the 
p(Ac-Am) polymer into the composite lattice. Hydration of such 
‘fixed core/expandable shell’ crystals led to complete fragmentation 
of the shell layer due to the strain generated at the mechanically mis- 
matched core-shell interface, exposing the morphologically unaltered 
core layer (Fig. 4e). These examples highlight the facility with which 
chemical and mechanical patterning are achieved in protein crystal- 
hydrogel hybrids. 

We have reported here a new form of materials that integrate macro- 
molecular protein crystals with synthetic polymer networks. These 
hybrids seamlessly combine the structural order and periodicity of 
crystals, the adaptiveness and tunable mechanical properties of poly- 
meric networks and the chemical versatility of protein building blocks. 
Additionally, the ability to reversibly expand-contract crystal lattices 
and mobilize their protein components may provide a new means 
to improve XRD quality and explore otherwise inaccessible protein 
structural states using three-dimensional protein crystallography. 
Protein crystals are often highly porous, sometimes containing up to 
90% solvent, and are usually assembled through weak, non-covalent 
packing interactions; therefore, our approach should be applicable to 
other protein lattices. Their potential for generalizability, coupled with 
the chemical tailorability of synthetic polymers and the genetic muta- 
bility of proteins, should make protein crystal-hydrogel hybrids a rich 
medium for materials science. 
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Fig. 4 | Self-healing behaviour and functionalization of ferritin 
crystal-hydrogel hybrids. a, Light microscopy images of crystal-hydrogel 
hybrids, showing the self-healing of cracks that appear during Ca-induced 
contraction. b, Extensive cracks can also appear during crystal expansion 
or during the initial stages of NaCl-induced contraction, but eventually 
self-heal. The arrows point to the termini of the major crack extending 
through the crystal. c, Schematic of crack formation and self-healing 
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Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0057-7. 
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METHODS 


Protein expression, purification and characterization. The plasmid for the AC* 
variant of human heavy-chain ferritin (HuHF), devoid of all native cysteine res- 
idues (C90E, C102A and C130A), was obtained via site-directed mutagenesis as 
previously described*". Expression and purification of AC* was performed accord- 
ing to the previously published protocol”. 

Determination of zeta potentials. Purified ferritin was concentrated to 
about 200M and exchanged into a buffer solution containing 50 mM 
2,2-Bis(hydroxymethyl)-2,2’,2"-nitrilotriethanol (Bis-Tris) (pH 6.0), 50mM 
4-(2-hydroxyethyl)- 1-piperazineethanesulfonic acid (HEPES) (pH 7.0) or 50 mM 
HEPES (pH 7.5) using an Amicon Ultra centrifugal filter unit (10 kDa cutoff). The 
zeta potentials of ferritin in the three different buffers were determined using a 
Zetasizer Nano ZS90 (Malvern Instruments). Experimental runs were performed 
to collect 12 datasets with a He-Ne laser at 633 nm. 

Formation of crystal-hydrogel hybrids. Polymer precursor solution. 25 mM 
(HEPES; pH 7.0), 30mM CaCh, 917 mM (8.625% w/v) sodium acrylate, 352 mM 
(2.5% w/v) acrylamide and 13 mM (0.2% w/v) N,N’-methylenebis(acrylamide). 
Polymerization solution. 4M NaCl, 1% (w/v) APS and 1% (v/v) TEMED. 
Octahedron-shaped ferritin crystals formed over 1-2 days in a buffered solution 
containing 25 mM HEPES (pH 7.0), 3-14.5 1M protein (per 24meric ferritin cage) 
and 4.5-7.5mM CaCh. Once the ferritin crystals matured, the crystallization solu- 
tion was replaced with the polymer precursor solution. Crystals were soaked for 
more than 10h to ensure full infusion of the monomers into the ferritin crystals, 
and were then individually transferred with a mounted CryoLoop (Hampton) 
to the polymerization solution for 5 min, initiating in crystallo polymerization. 
Alternatively, the crystallization solution was replaced with the polymerization 
solution for the bulk polymerization of many crystals at once. 

Measurement of the rate of diffusion into ferritin crystals. A large ferritin crystal 
was transferred with a mounted CryoLoop onto a glass slide, and 20,1] of a solution 
containing 20 |.M (0.01 mg ml) rhodamine B, 30 mM CaCl and 25mM HEPES 
(pH 7.0) was added to the crystal. The rhodamine diffusion process was moni- 
tored with a 10x air objective installed on a spinning-disk confocal Axio Observer 
inverted microscope (Zeiss) equipped with a pair of Quantum 5125C cameras 
(Roper), using a filter to collect light at 575-650 nm (red channel). Differential 
interference contrast (DIC) and fluorescence (564nm excitation) images were cap- 
tured at 1-s intervals with a 10-ms exposure. Images were collected in Slidebook 6 
(Intelligent Imaging Innovations) and analysed using Fiji (http://fiji.sc/Fiji). 
Determination of in crystallo polymerization dynamics. Ferritin crystals were 
incubated in a polymer precursor solution supplemented with 5.7 mM (0.3%) 
pyranine (Sigma-Aldrich). After 12h, an individual crystal was transferred onto a 
glass slide and polymerization was initiated by adding 10 1 of the polymerization 
solution. Hydrogel polymerization throughout the crystal and the corresponding 
decrease of pyranine fluorescence were monitored with a 20x air objective on the 
confocal microscope as described above, using a filter to collect light at 500-550 nm 
(green channel). DIC and fluorescence (488 nm excitation) images were captured 
at 1-s intervals with 100-ms (DIC) and 1-s (fluorescence) exposures. 

Scanning electron microscopy of ferritin crystals. Native ferritin crystal and 
crystal-hydrogel hybrid samples were deposited onto glow-discharged, Formvar/ 
carbon-coated Cu grids (Ted Pella Inc.). Each grid was blotted with filter paper to 
remove excess liquid. Grids were mounted onto a STEM 12x v2 sample holder and 
imaged using a Sigma 500 scanning electron microscope (Zeiss) at an accelerating 
voltage of 1 kV using a 30-j1m aperture. 

Polymer quantification with °F NMR. Large-scale crystallization of ferritin was 
carried out in a 24-well culture plate (Costar). 100 pl of 25M ferritin in 15 mM 
Tris (pH 7.4) and 150mM NaCl) was combined with 10011 of a buffered solution 
containing 50 mM HEPES (pH 7.0) and 12mM CaCh. Crystals formed overnight 
and matured over 72h. The solution in each well was replaced with 10011 of a 
polymer precursor soaking solution containing: 25 mM HEPES (pH 7.0), 30 mM 
CaCh, 179.9mM 2-(trifluoromethyl)acrylic acid, 744.8 mM sodium acrylate, 
350.7 mM acrylamide and 20.4mM N,N’-methylenebis(acrylamide). After soak- 
ing overnight, this solution was removed, and the crystals were washed with a 
buffered solution (25mM HEPES, pH 7.0; 30 mM CaCl.) to remove unincorpo- 
rated monomers. Polymerization was initiated by replacing the washing solution 
with 100,11 of the polymerization solution. After 10 min, the crystals were trans- 
ferred into an Eppendorf tube and centrifuged at 2,000g for 60s. The supernatant 
was decanted, and the crystals were resuspended in 1 ml D,O. Concentrated HCl 
was added until the pH of the solution was approximately 4.0 to facilitate crys- 
tal decomposition. 7051 of this solution was transferred into an NMR tube and 
supplemented with 4.6 mM trifluoroacetic acid. The *F-NMR spectrum was col- 
lected using a 300M Bruker AVA spectrometer with a '°F probe (Extended Data 
Fig. 3). The peak at —64.94 p.p.m. corresponds to free 2-(trifluoromethyl)acrylic 
acid, the cluster of peaks near —67.07 p.p.m. to 2-(trifluoromethyl)acrylic acid 
that has been incorporated into the polymer, and the peak at —75.51 p.p.m. to the 
trifluoroacetic acid standard. From the integration of these peaks it was deduced 


that (a) the total concentration of 2-(trifluoromethyl)acrylic acid in the solution 
was 8.2mM and (b) 74.7% of the monomeric precursor was incorporated into the 
polymer matrix inside the crystals. The protein concentration was determined to 
be 60.0|.M using the Bradford assay*’, and the molar ratio of 2-(trifluoromethyl) 
acrylic acid to ferritin was calculated as 137:1. Given this ratio and the fact that 
each unit cell of the ferritin crystals contains four ferritin cages and has a volume 
of about 5832 nm? (a= 18nm), the concentration of 2-(trifluoromethyl)acrylic 
acid in the crystal lattice was calculated as 155.6 mM, which is very similar to its 
concentration (179.9 mM) in the soaking solution. 

Monitoring crystal expansion-contraction using light microscopy. Single crystals 
were transferred with a mounted CryoLoop onto a glass slide with a microscopic 
ruler (OMAX). All images and videos were obtained on an SZX7 (Olympus) micro- 
scope equipped with an Infinity 1 charge-coupled device (CCD; Lumenera). For 
crystals that had not been polymerized, 10 1] of the polymerization solution was 
carefully added to minimize crystal movement. This solution was removed before 
water addition. For previously polymerized crystals, water (Milli-Q, 3011) was 
added and crystal expansion was observed over 5-20 min. To initiate crystal con- 
traction, water was replaced with a solution containing either 4M NaCl or 1M 
CaCl. This expansion-contraction cycle could be repeated at least eight times for a 
crystal if NaCl was used to induce crystal contraction. Crystal size was determined 
by measuring the edge length ofa facet relative to the microscopic ruler using the 
Fiji image processing package. 

Monitoring crystal expansion using confocal microscopy. Crystals containing 
polymer precursors were prepared as described above. One of these crystals was 
transferred onto a glass slide and imaged on a confocal microscope. After capturing 
an initial image, the crystal was polymerized in 1011 of the polymerization solution, 
and its expansion in 3011 water was monitored. DIC images were captured at dif- 
ferent time intervals with a 100-ms exposure until the crystal was no longer visible. 
Quantification of protein release during expansion. Large-scale crystallization 
of ferritin was carried out as described above. Once crystals fully matured, the 
well solution was replaced with 10011 of the polymer precursor solution. After 
12h, the crystals were all combined into a single Eppendorf tube and 500 1l of 
the polymerization solution was added. Crystals were expanded by replacing the 
polymerization solution with 1 ml water. During this experiment, aliquots (100 11) 
of the protein solution were removed and replaced with 10011 of water, and each 
aliquot was used to determine the protein concentration using the Bradford assay. 
Multi-crystal expansion monitored using SAXS. Crystals for multi-crystal 
small-angle X-ray scattering were prepared as described above and transferred into 
the polymer precursor solution. A large number (n > 100) of crystals were trans- 
ferred to an Eppendorf tube. After the crystals had settled at the bottom, they were 
transferred, along with 501] of solution, into a 1.5-mm quartz capillary (Hampton). 
Crystals in capillaries were analysed at beamline 5-ID-D of the Advanced Photon 
Source (Argonne National Laboratory). Data were collected using collimated X-ray 
radiation (0.7293 A, 17 keV) calibrated with both a glassy carbon standard and a 
silicon diffraction grating. After the sample was mounted on the instrument, a thin 
tube (with a diameter of 0.51 mm) was inserted into the capillary to facilitate the 
addition of 5011 of solution with a syringe injector during X-ray exposure. The 
injected solution contained a more concentrated polymerization solution without 
NaCl (2% APS and 2% TEMED) in water. After the first exposure, the solution was 
injected, and an image with a 1-s X-ray exposure was collected every 30s. Peaks 
corresponding to the original lattice were visible throughout the process, indicating 
that some of the crystals in the bulk sample did not expand. This is probably due to 
limited solvent diffusion or incomplete polymerization within the capillary tubes 
used for the SAXS experiments. It is important to note that in this procedure, 
‘polymerized’ crystals immediately began expanding upon the commencement 
of data collection. The reason for this experimental strategy (instead of polymer- 
ization in a high-ionic-strength solution, followed by the initiation of expansion 
through lowering the ionic strength) is that it was not possible to sufficiently dilute 
the high-ionic-strength polymerization solution in the thin capillary tubes used 
for SAXS (which cannot accommodate addition of large volumes of solution) to 
enable expansion. 

Scattered radiation was detected using a CCD area detector and one- 
dimensional scattering data were obtained through the azimuthal averaging of the 
two-dimensional data to produce plots of the scattering intensity as a function of 
the scattering vector length, q=4msin(6/.), where 0 is one-half of the scattering 
angle and ) is the wavelength of the X-rays used. Analysis of the one-dimensional 
data was performed using the powder diffraction processing software JADE (MDI) 
or Origin (OriginLab). 

Multi-crystal SAXS at elevated temperatures. Large-scale crystallization of fer- 
ritin was performed as described above. The crystallization solution was removed, 
and ferritin crystals were resuspended in either the polymer precursor solution or 
a buffered solution containing 25 mM HEPES (pH 7.0) and 30mM CaCl. After 
72h, the polymer precursor soaking solution was replaced with the polymerization 
solution. After 10 min, this was also replaced with a buffered solution containing 
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25mM HEPES (pH 7.0), 4M NaCl and 30mM CaCl. Both samples, containing 
either native ferritin crystals or the crystal-hydrogel hybrids, were transferred 
into 1.5-mm quartz capillaries (Hampton). Data were collected at beamline 4-2 of 
SSRL using collimated X-ray radiation (1.1271 A, 11 keV) calibrated with a silver 
behenate standard. The samples were heated using a custom-built thermal stage 
operating at 1°C min™!, and images with a 1-s X-ray exposure were collected every 
minute. Scattered radiation was detected using a Pilatus3 X 1M detector (Detectris) 
and processed as described above. 

Single-crystal SAXS. Crystals containing the polymer precursors were prepared 
and polymerized using the polymerization solution as described above, and were 
analysed at SSRL (beamline 4-2). Single crystals were harvested with a mounted 
CryoLoop and transferred into a 2M NaCl] solution in the 400-j1m-diameter central 
well ofa custom-made microfluidic chip (Extended Data Fig. 8a, b). The microfluidic 
chip was sealed with a coverslip, attached to a syringe injector and mounted on 
beamline 4-2 at SSRL for data collection. Data were collected using collimated X-ray 
radiation (1.127 A, 11 keV) calibrated with a silver behenate standard. Water was 
injected into the microfluidic chip at 11 s to initiate expansion, and 0.5-s-exposure 
images were taken every about 2.5s for 4min. After the data acquisition for crystal 
expansion was complete, the process was repeated—in the order 4M NaCl, water, 
1M CaCl), water—to monitor repeated contraction and expansion processes. Data 
were collected using a Pilatus3 X 1M detector (Detectris). The unit-cell parameters 
were determined by calculating the radial distance of individual reflections, after 
fitting the spot intensity to a two-dimensional Gaussian surface. 

Single-crystal XRD at room temperature. Crystals containing polymer precursors 
were prepared and imaged using light microscopy as described above. A single crystal 
was transferred onto a MicroMount precision tool (MiTeGen) with a 100-j1m 
aperture and sealed with a MicroRT capillary (MiTeGen). Data were acquired on 
an APEX II CCD diffractometer (Bruker) using Cu Ka radiation (1.5418 A) at 
295 K. Three images (60-s exposure) were collected at rotation angles (p= 0°, 60° 
and 120°. The crystal was removed from the instrument and soaked in 10,11 of the 
polymerization solution for 2 min. The crystal was transferred onto a microscopic 
ruler and 30,11 of water was added. Crystal expansion was measured over 3 min. 
This crystal was returned to the MicroMount with the MicroRT capillary and an 
identical three-image dataset was collected. This process was repeated using 301 of 
a 1M CaCh solution. After the crystal had contracted (1 min), another three-image 
dataset was collected. Images were analysed with the Apex III software (Bruker). 
Single-crystal XRD at 100K. Crystal—-hydrogel hybrids were prepared and imaged 
using light microscopy as described above. Two crystals were harvested, 3011 of 
water was added, and crystal expansion was monitored over 5 min for both crystals. 
After 5 min, the water was removed and 30 11 of either a solution containing 1M 
CaCl, (crystal A) or 4M NaCl (crystal B) was added. Crystal B was re-expanded 
in 30,1 water. After 5 min, the water was replaced with 301l of a 1 M CaCl solu- 
tion to contract crystal B. After contraction, both crystals were cryoprotected in 
perfluoropolyether (Hampton) and frozen in liquid N>. Single-crystal XRD data 
for the contracted ferritin crystals were collected at 100 K at beamline 9-2 of SSRL 
using 0.98-A radiation. The data were integrated using iMosflm™ and scaled with 
Aimless*° (Extended Data Table 1). The structures for crystal A and crystal B were 
determined at resolutions of 1.06 A and 1.13 A, respectively. Molecular replacement 
was performed with Phaser*® using the HuHF structure (PDB ID, 5CMQ) asa 
search model. Rigid-body, positional, anisotropic thermal and atom-occupancy 
refinements were carried out using Phenix*”. Coot** was used for iterative manual 
model building. The interstitial solvent content was calculated by subtracting 
the solvent volume of each crystal from the volume of the inner cavity of ferritin 
(calculated using VOIDOO)”. All figures were produced with Pymol”. 
Nanoindentation measurements of crystals. The mechanical properties of the 
native ferritin crystals and the crystal-hydrogel hybrids were determined using a 
Hysitron TI 950 Triboindenter test instrument (Bruker). All crystals were dried 
before the indentation experiments. A Berkovich probe (TI-0039, 142.3°, 100nm 
tip radius) was used to determine the hardness and reduced modulus of the native 
crystals and crystal-hydrogel hybrids. Experiments were conducted in displace- 
ment control mode using a displacement of 1,000 nm. 
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Preparation of iron-loaded ferritin. Iron-loaded ferritin was prepared by adding 
10.8 ml of 10 mM (NHg)2Fe(SO4)2 over 2h to 144.8 ml of a vigorously stirring 
solution containing 11M ferritin, 15 mM Tris (pH 7.4) and 150mM NaCl. 
Subsequently, the solution was stirred for an additional hour before being con- 
centrated to about 3 ml using a 10-kDa Amicon membrane. A 10DG column 
(Bio-Rad) was used to remove any unbound iron. The iron content was assessed 
using a 2-2'-bipyridine-based colorimetric assay*! and the protein concentration 
was determined using the Bradford assay. Each ferritin cage contained about 800 
Fe atoms. 

Formation of core-shell ferritin crystals. Expandable core/expandable shell 
crystals. Mature ferritin crystals were transferred to a buffered solution contain- 
ing 25mM HEPES (pH 7.0), 30mM CaCl, and 1.9mM (1 mg ml!) 5-(and 6)- 
carboxytetramethyl rhodamine succinimidyl ester (NHS-rhodamine; Thermo 
Fisher Scientific). After soaking for 12h, an individual crystal was removed and 
washed three times in a buffered solution containing 25 mM HEPES (pH 7.0) 
and 30 mM CaCl, to remove unbound NHS-rhodamine. The crystal was trans- 
ferred to a well containing 10,11 of 12.5,.M ferritin, 25 mM HEPES (pH 7.0) and 
6mM CaCh,. A transparent layer of ferritin formed around the rhodamine-labelled 
ferritin crystal over 12h (creating a red core and a transparent shell). This crystal 
was soaked in a polymer precursor solution and polymerized as described above 
to yield an expandable core/expandable shell crystal. 

Fixed core/expandable shell crystals. Fixed core/expandable shell ferritin crystals 
were prepared similarly to the expandable core/expandable shell crystals described 
above. The only difference was that after the rhodamine labelling step, the crystal 
was transferred into a solution containing 2.5% (v/v) glutaraldehyde, 25 mM 
HEPES (pH 7.0) and 30mM CaCl) After 12h, the crystal was washed five times 
with water to remove unbound glutaraldehyde, followed by the epitaxial growth 
of the transparent layer of ferritin crystals on top of the core layer in a fresh crys- 
tallization solution containing 12.5,.M ferritin. This crystal was then soaked in a 
polymer precursor solution and polymerized as described above to yield a fixed 
core/expandable shell crystal. 

Data availability. Crystal structures have been deposited in the Research 
Collaboratory for Structural Bioinformatics Protein Data Bank under accession 
codes 6B8F (ferritin-polymer hybrid crystal 1; https://www.rcsb.org/struc- 
ture/6b8f) and 6B8G (ferritin-polymer hybrid crystal 2; https://www.rcsb.org/ 
structure/6b8g). Raw data are available from EA.T. 
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Extended Data Fig. 1 | Distribution of electrostatic charge on the 
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electrostatic charge distribution on the ferritin surface, as viewed along the 
surface of ferritin and size distribution of ferritin in solution. two-, three- and four-fold symmetry axes. Positive (+5kgT/e) and negative 
a, pH-dependent zeta potentials of ferritin, determined by dynamic-light- (—5kgT/e) charges are shown in blue and red, respectively. kg, Boltzmann 
scattering measurements. b, Dynamic-light-scattering profile of ferritin constant; e, electron charge. 

(200 1M) in a solution of 50 mM HEPES (pH 7.0). c, Representation of the 
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Extended Data Fig. 2 | Molecular diffusion and polymerization in aqueous NaC] solution into the crystal. The ring-shaped diffusion front 
ferritin crystals, monitored using confocal microscopy. a, Diffusion becomes evident at time t= 108 s and disappears by t= 216 s. The crystal 
of rhodamine B into a ferritin crystal over 15 min. b, c, In crystallo expands by approximately 5% (edge length) during polymerization. Scale 
polymerization of the hydrogel network, monitored through the decrease bars in a and b correspond to 100 1m. d, Scanning electron microscopy 
of integrated pyranine fluorescence (green fluorescence channel). The images of native ferritin crystals (top) and crystal-hydrogel hybrids 
corresponding bright-field (DIC) images show the diffusion of the (bottom). 
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Extended Data Fig. 3 | Quantification of an acrylic acid analogue 
using °F NMR. a, '°F-NMR spectrum, showing peak assignments for 
the trifluoroacetic acid standard, free 2-(trifluoromethyl)acrylic acid, and 
2-(trifluoromethyl)acrylic acid incorporated into the polymer. 
b, Diagram illustrating the experimental protocol for the quantification 


of 2-(trifluoromethyl)acrylic acid uptake into the crystal lattice. The 
concentration of the 2-(trifluoromethyl)acrylic acid in the crystal lattice 
(155.6 mM) is approximately the same as its concentration in the soaking 
solution (see Methods for details). 
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Extended Data Fig. 4 | Isotropic hyperexpansion of ferritin crystal- solution from expanding crystal-hydrogel hybrids (n > 10,000) over 
hydrogel hybrids. a, b, Continuous expansion of two different about 4h. Negligible ferritin release is observed until about 1h. Protein 
crystal—hydrogel hybrids in deionized water, monitored using confocal concentrations were determined using the Bradford assay. d, Confocal 
microscopy. Crystal facets are still discernible after expansion for more microscopy images of highly expanded crystal-hydrogel hybrids, showing 
than 2h. Scale bars correspond to 100m. ¢, Ferritin release into the the structural deterioration of the facets and the edges. 
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Extended Data Fig. 5 | Expansion and contraction behaviour of crystal- 
hydrogel hybrids in the presence of different metal ions. a, Light 
micrographs of the crystal—-hydrogel hybrids at different stages of 
expansion and contraction in response to different metal ions. b, XRD 
patterns (T = 273 K) of expanded crystal—-hydrogel hybrids, acquired upon 
contraction with different metal ions. Contraction with divalent cations 
(Ca, Mg, Cd, Zn, Ni and Co) reproducibly leads to the recovery of the full 
atomic-level order, whereas contraction with monovalent cations (Li, Na 
and K) only reinstates low-order diffraction peaks. 
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Extended Data Fig. 6 | Successive expansion-contraction cycles for a 
single ferritin crystal-hydrogel hybrid. Light micrographs of a hybrid 
crystal at pre- and post-expansion stages in each cycle are shown on the 
left, and the corresponding changes in edge length upon expansion- 
contraction are shown on the right. The separation between the major 
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the first expansion cycle, which we ascribe to residual CaCl, (which forms 
strong polymer-polymer and protein-protein interactions) remaining 

in the solution that is transferred on the loop along with the crystal. The 
subsequent variability in the rate and extent of expansion is attributed to 
the different amounts of residual NaCl transferred in each cycle. 
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Extended Data Fig. 7 | Alternative hydrogel formulations. a, Alternative containing polyacrylate (molecular weight, My = 2,100 Da) dissolves upon 
monomer combinations that yield successful in crystallo polymerization being transferred into water. The separation between the major ticks of the 
and crystal expansion. b, Monomer combinations that lead to crystal ruler is 100j1m. MBAm, N,N’-methylenebis(acrylamide). 

dissolution during polymerization. c, A crystal soaked in a solution 
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Extended Data Fig. 8 | SAXS imaging of a single crystal-hydrogel 
hybrid in a microfluidic chip. a, Schematic diagram of the microfluidic 
chip. b, Side-view representations of the microfluidic chip. c, Photograph 
of the microfluidic chip, mounted on beamline 4-2 at SSRL. d, Single- 


crystal SAXS diffraction patterns observed at different stages of crystal 
expansion and contraction. The Miller indices of each visible spot are 
indicated. Reflections with the highest signal-to-noise ratio (I/ol) are 
circled in red. e, Spot profiles of the highest-I/oI reflections indicated in d. 
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Extended Data Fig. 9 | Mechanical and thermal properties of ferritin 
crystal-hydrogel hybrids. a, Light-microscopy images showing the 
fragmentation of a native ferritin crystal and of a crystal-hydrogel hybrid 
upon application of external force with a needle at the location indicated 
with the arrow. The separation between the major ticks of the ruler is 
100m. b, Temperature dependence of the SAXS profiles of native ferritin 
crystals and crystal-hydrogel hybrids. The small-angle reflections (that 


is, periodic order) in both samples are maintained at 80°C (the maximal 
temperature experimentally attainable). c, Determination of the hardness 
and reduced modulus of native ferritin crystals and crystal-hydrogel 
hybrids using atomic force microscopy nanoindentation measurements. 
d, Light-microscopy images showing the expansion and contraction of 

a crystal-hydrogel hybrid containing Fe-loaded ferritin molecules. The 
separation between the major ticks of the ruler is 100 ,1m. 
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Extended Data Table 1 | X-ray data collection and refinement statistics 


Data collection 
Space group 
Cell dimensions 
a, b, c (A) 
a, B,y (°) 
Resolution (A) 
Number of unique reflections 
Multiplicity 
CC1/2 
Rmerge 
<l/ob 
Completeness (%) 
Average mosaicity (°) 
Total solvent content (%) 
Interstitial solvent content (%) 
Refinement 
Rwork / Firee 
Number of atoms 
Protein 
Ligand/ion 
Water 
B-factors (A°) 
Protein 
Ligand/ion 
Water 
Root-mean-square deviations 
Bond lengths (A) 
Bond angles (°) 
MolProbity** score 
Clashscore 
Ramachandran plot (%) 
Favoured 
Outliers 
Rotamers (%) 
Favoured 
Poor 
DPI (A)*° 


Crystal A 
(6B8F) 


F432 


180.40 
90 
63.65-1.06 
111189 
14.0 (2.4) 
0.999 (0.897) 
0.060 (0.222) 
24.8 (3.2) 
99.0 (86.4) 
0.17 
57.42 
39.72 


0.0910/0.1026 


1687 
13 
340 


8.51 
10.19 
21.83 


0.013 
1.317 
1.20 
4.12 


98.82 
0.00 


97.35 
0.53 
0.011 


Crystal B 
(6B8G) 


F432 


179.95 
90 
63.62-1.13 
92912 
32.2 (13.8) 
0.999 (0.628) 
0.131 (1.131) 
19.1 (2.3) 
100 (100) 
0.31 
57.36 
39.62 


0.1029/0.1213 


1699 
13 
372 


9.81 
11.96 
23.75 


0.011 
1.241 
1.38 
6.99 


98.82 
0.00 


96.88 
0.52 
0.015 


Numbers in parentheses correspond to the highest resolution shell. Rmerge and CC 1/2 are measurements used to determine an appropriate high-resolution limit for XRD data. 
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Synthesis, structure and reaction chemistry of a 
nucleophilic aluminyl anion 


Jamie Hicks!, Petra Vasko, Jose M. Goicoechea!* & Simon Aldridge!* 


The reactivity of aluminium compounds is dominated by their 
electron deficiency and consequent electrophilicity; these 
compounds are archetypal Lewis acids (electron-pair acceptors). 
The main industrial roles of aluminium, and classical methods 
of synthesizing aluminium-element bonds (for example, 
hydroalumination and metathesis), draw on the electron deficiency 
of species of the type AIR; and AICI;!”. Whereas aluminates, 
[AIR,]~, are well known, the idea of reversing polarity and using 
an aluminium reagent as the nucleophilic partner in bond- 
forming substitution reactions is unprecedented, owing to the 
fact that low-valent aluminium anions analogous to nitrogen-, 
carbon- and boron-centred reagents of the types [NX2]~, 
[CX3]~ and [BX,]~ are unknown*>. Aluminium compounds 
in the +1 oxidation state are known, but are thermodynamically 
unstable with respect to disproportionation. Compounds of this 
type are typically oligomeric®§, although monomeric systems 
that possess a metal-centred lone pair, such as Al(Nacnac)?'PP 
(where (Nacnac)?iPP = (NDippCR),CH and R= ‘Bu, Me; 
Dipp =2,6-'Pr2CsH3), have also been reported”!”. Coordination 
of these species, and also of (n°-CsMes) Al, to a range of Lewis 
acids has been observed!!-), but their primary mode of reactivity 
involves facile oxidative addition to generate Al(111) species *"*"1°, 
Here we report the synthesis, structure and reaction chemistry of 
an anionic aluminium(1) nucleophile, the dimethylxanthene- 
stabilized potassium aluminyl [K{Al(NON)}]2 (NON =4,5-bis(2,6- 
diisopropylanilido)-2,7-di-tert-butyl-9,9-dimethylxanthene). 

This species displays unprecedented reactivity in the formation 
of aluminium-element covalent bonds and in the C-H oxidative 
addition of benzene, suggesting that it could find further use in both 
metal-carbon and metal-metal bond-forming reactions. 

Computational studies suggest that, relative to the correspond- 
ing bory] anion, heterocyclic (diamido)aluminy] systems of the type 
[Al(NRCH),]~ should have a higher energetic separation between the 
singlet ground electronic state and the triplet excited state (AE), and 
a lower energy associated with the group-13-centred lone pair?!”"®. 
Despite this, no such systems have been experimentally realized. By 
making use of the steric demands and o-electron-withdrawing prop- 
erties of bulky arylamido substituents, together with a flexible chelating 
dimethylxanthene backbone, we have observed that aluminyl com- 
pounds can be synthesized that are stable up to 300 K, amenable to 
structural characterization in the solid state, and function as a nucleop- 
hilic source of aluminium in substitution chemistry. 

Drawing inspiration from the two major classes of anionic carbon- 
centred nucleophiles ubiquitous in organic synthesis (namely, group 
1 metal alkyl/aryl compounds and Grignard reagents)*'°, we set out 
to synthesize potassium and magnesium aluminyl compounds of the 
types [K{AI(NR2)2}], and [RMg{Al(NR2)2}],. Deprotonation of the 
bifunctional dimethylxanthene-derived secondary aniline (NON)H, 
with K[N(SiMes),], followed by reaction with All;, generates the Al(m) 
iodide (NON)AII (Fig. 1, Extended Data Fig. 1 and Extended Data 
Table 1). (NON)AII is a versatile substrate for reduction chemistry: 


reaction with potassium graphite in toluene or benzene solution pro- 
vides access to Al(1) or Al(11) products depending on reaction stoichio- 
metry. The use of excess KC forms the dimeric potassium aluminyl 
complex [K{Al(NON)}]> in yields of approximately 75%, whereas the 
use of less forcing conditions generates the Al-Al bonded dialane, 
[Al(NON)]> (Fig. 1). Both compounds were characterized by spectro- 
scopic and analytical techniques, and their solid-state structures deter- 
mined by single crystal X-ray diffraction (Fig. 2, Extended Data Fig. 2 
and Supplementary Information). 

Given the well-known preference of aluminium for the +3 oxida- 
tion state’, and the difficulty in locating hydrogen atoms by X-ray 
crystallography, it was important to rule out the presence of any 
metal-bound hydrogen atoms within the dimeric molecular unit 
of [K{Al(NON)}]». In particular, given the presence of potassium 
counter-ions, we wished to rule out the formation of the dihydro- 
aluminate [K{H,Al(NON)}]». Accordingly, [K{H2Al(NON)}]» was 
synthesized independently by the reaction of (NON)AII with excess 
K[AIHg], and characterized by spectroscopic, analytical and crystal- 
lographic techniques (Extended Data Fig. 3). These measurements 
revealed considerable differences from the corresponding data of 
[K{Al(NON)}]>. These included shifted 'H NMR resonances for 
the diamido-dimethylxanthene ligand backbone, and an additional 
broad signal at 54 = 3.88 p.p.m. associated with the aluminium- 
bound hydrogen atoms in [K{H,Al(NON)}],; an additional Al-H 
stretching band in the infrared spectrum at 1,688 cm! (Extended Data 
Fig. 4); and shorter Al-O and AI-N distances in the solid-state structure 
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Fig. 1 | Syntheses of the potassium aluminyl compound [K{Al(NON)}]2 
and the dialane [Al(NON)],. Both compounds are prepared from 
(NON)AII by reduction with potassium graphite. 
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Fig. 2 | Geometric and electronic structure of [K{Al(NON)}]2. 

a, Molecular structure of [K{Al(NON)}], as determined by X-ray 
crystallography. Hydrogen atoms have been omitted and selected carbon 
atoms are shown in wireframe format for clarity; thermal ellipsoids have 
been drawn at the 35% probability level. Key distances (A) and angles (°): 


(mean values of 2.137 A and 1.923 A respectively, compared with 
2.279(2) A and 1.956(2) or 1.963(2) A for [K{Al(NON)}]2). Perhaps 
even more definitive is the additional finding that [K{Al(NON)}]> adds 
dihydrogen under ambient conditions in benzene solution (or at 2 atm 
pressure in the solid state), with its conversion into [K{H,Al(NON)}]2 
being confirmed by in situ NMR measurements. 

The structure of [K{Al(NON)}], in the solid state is shown by crys- 
tallographic studies to be a centrosymmetric dimer, featuring two 
formally anionic [Al(NON)]~ units held together through flanking 
potassium-arene contacts involving the two potassium counter-ions 
(d(K...C) =3.226(3)-3.474(3) A). The non-bonded Al... Al separation 
is more than double the length of the formal Al-Al single bond found in 
[Al(NON)]2 (6.627(1) compared with 2.646(1) A)?°, whereas the K... 
Al contacts (3.844(1) and 4.070(1) A) are comparable to (or slightly 
longer than) those reported, for example, for the tetragallium cluster 
K,[Ar2Gaq], which also features potassium counter-ions sandwiched 
between flanking aryl rings (3.471(1)-3.833(1) A; Ar =CoH3(C,H,'P 
13-2,4,6)9-2,6)7!. 

The aluminium centre in [K{Al(NON)}]. features a flattened 
pyramidal coordination sphere (Z(N-AI-N) = 128.1(1)°; Z(N- 
Al-O) =72.9(1), 72.5(1)°). The AI-N distances (1.956(2) and 1.963(2) A) 
are consistent with an Al(1) compound: the corresponding bond lengths 
measured for [Al(NON)]2 and (NON)AII are successively shorter 
(1.895(2)-1.901(2) A and 1.846(2) A, respectively), in line with the 
reduced covalent radii of Al(11) and Al(mm) (compared with that of 
Al(1))!?. In addition, whereas the Al-O distances associated with the 
neutral ether donor are very similar for [Al(/NON)]2 and (NON)AII 
(1.976(2) or 1.981(2) A and 1.967(2) A, respectively), that measured 


Al(1)...Al(1’) 6.627(1), Al(1)...K(1) 4.070(1), Al(1)...K(1’) 3.844(1), 
Al(1)-N(1) 1.963(2), Al(1)-N(2) 1.956(2), Al(1)-O(1) 2.279(2), N(1)- 
Al(1)-N(2) 128.1(1). b, The HOMO of [K{Al(NON)’}], as calculated by 
DFT. Isovalue = 0.05. 


for [K{Al(NON)}]> is markedly longer (2.279(2) A), consistent with 
the reduced Lewis acidity expected for a formally anionic metal cen- 
tre. This metal-oxygen distance is, however, noticeably shorter than 
that measured for the isostructural gallium analogue [K{Ga(NON)}]> 
(2.542(2) A; Extended Data Fig. 5) and the associated ‘puckering’ of the 
dimethylxanthene backbone is more pronounced (the angle between 
least squares planes of aromatic rings is 38.6°, compared with 25.9° 
for [K{Ga(NON)}]2). Both of these observations suggest that the 
dative oxygen—metal interaction is structurally more important for 
the aluminium system, with the implied population of the Al-centred 
pz orbital potentially leading to a relatively wide HOMO-LUMO gap 
(the energy difference between the highest occupied and lowest unoc- 
cupied molecular orbitals). 

The electronic structure of [K{Al(NON)}]2 was probed using density 
functional theory (DFT) calculations, both on the monomeric aluminyl 
anion [Al(NON)]~ and on the model dimeric system [K{Al(NON’)}]> 
(in which the ‘Pr and ‘Bu substituents were replaced by Me for com- 
putational efficiency). These calculations suggest that the HOMO- 
LUMO gap for the [Al(NON)]~ fragment is approximately 338 kJ mol! 
(3.50 eV), and that a singlet electronic ground state is therefore pre- 
dicted (AE,,=166kJ mol“, 1.72 eV)!”!®. Presumably, a major contrib- 
utory factor here is the fact that the LUMO + 3 (rather than the LUMO) 
features the primary contribution from the aluminium p, orbital, 
and that this orbital is effectively Al-O o* antibonding in character 
(consistent with the observed Al-O distance). 

The energy of the HOMO calculated for the (hypothetical) mono- 
meric [Al(NON)]~ anion in the gas phase is very high at —101kJ mol! 
(—1.05eV), but this value decreases to —395kJ mol! (—4.10 eV) for 
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Fig. 3 | Exploitation of the aluminium-centred nucleophilic reactivity 
of [K{Al(NON)}]2. The formation of Al-H, Al-C and AI-M bonds 


is achieved by the reaction of the potassium aluminy] reagent with 
appropriate electrophiles. 
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Fig. 4 | Molecular structures of an aluminium alkyl and a magnesium 
aluminyl compound formed via reactions of [K{Al(NON)}]2, as 
determined by X-ray crystallography. Hydrogen atoms have been 
omitted and selected carbon atoms shown in wireframe format for clarity; 
thermal ellipsoids have been drawn at the 35% probability level. a, (NON) 
AlMe. Key bond lengths (A) and angles (°): Al(1)-C(48) 2.029(3), 


the [K{Al(NON’)}]> dimer—the HOMO in this case being the out-of- 
phase combination of Al-centred lone pairs (Fig. 2). The idea that the 
dimeric Kt-bridged structure offers considerably enhanced stability 
gains additional support from diffusion-ordered spectroscopy (DOSY) 
NMR measurements carried out in CD solution. The hydrodynamic 
radius determined for [K{Al(NON)}], (9.7 A) can be compared to 
values of 7.7 A and 8.9A measured for monomeric (NON)AII and 
dinuclear [Al(NON)]>, suggesting that the dimeric structure of the 
potassium aluminyl compound is retained in the solution phase. 

Nonetheless, the HOMO energy for [K{Al(NON’)}]2 is markedly 
higher than that calculated for the charge-neutral Al(1) compound 
Al(Nacnac)PiPP using the same method (—453kJ mol~!, —4.70eV)’. 
Taken together with the slightly higher contribution of the aluminium 
3p orbital to the lone pair (24% compared with 10% for Al(Nacnac)?#P), 
these data suggest that [K{Al(NON)}]. should have greater potential to 
act as an aluminium-centred nucleophile. This assertion can be verified 
experimentally by the use of [K{Al(NON)}]> in a range of unprece- 
dented aluminium-element bond-forming reactions with electrophilic 
partners (Fig. 3). Its reaction with methy] triflate or methyl iodide to 
give (NON) AIMe (Fig. 4) demonstrates a new approach for the for- 
mation of aluminium alkyls that complements conventional methods 
using hydroalumination chemistry or aluminium electrophiles’. Ina 
similar manner, reactions with Bronsted acids can be used to gene- 
rate the monomeric hydride (NON)AIH via protonation at alumin- 
ium (Extended Data Fig. 6). Covalent metal-aluminium bonds can be 
constructed in a similar manner. The combination of [K{Al(NON)}]2 
with (NON)AII defines an alternative strategy for the formation of the 
dialuminium compound (NON)AI-AIl(NON), whereas its reaction 
with (Nacnac)™*MgI(OEt2) (where (Nacnac)M*s = (NMesCMe).CH, 
and Mes = 2,4,6-Me3C,¢H,) results in the formation of the magne- 
sium aluminyl compound (Nacnac)M**MgAl(NON) (Figs. 3 and 4). 
The latter features an unsupported Mg-Al bond, with the associ- 
ated bond length (2.696(1) A) comparable to the sum of the respec- 
tive covalent radii (2.62 A)?*”3. The metal-metal distance is also in 
line with that reported for the single Mg-Mg covalent bond in the 
Mg(1) dimer [Mg(Nacnac)??P], (d(Mg-Mg) =2.851(1) A)*4, with 
due allowance for the differing radii of aluminium and magnesium 
(Arcoy =0.2.A)*. As such, whereas [K{Al(NON)}]) represents an alu- 
minyl analogue of organo-group 1 nucleophiles of the type RLi and 
RK‘, (Nacnac)™*sMgAl(NON) can be thought of as the corresponding 
counterpart ofa Grignard reagent, RMex”. 

Despite having no precedent as a structurally authenticated alumi- 
nyl species, [K{Al(NON)}]2 is stable for several days at 300K, both in 
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Al(1)-N(1) 1.870(2), Al(1)-N(2) 1.867(2), Al(1)-O(1) 1.994(2), N(1)- 
Al(1)-N(2) 138.3(1). b, (Nacnac)M*sMgAl(NON). Key bond lengths (A) 
and angles (°): Al(1)-Mg(1) 2.696(1), Al(1)-N(1) 1.918(2), Al(1)-N(2) 
1.904(1), Al(1)-O(1) 1.992(1), Mg(1)-N(3) 2.033(2), Mg(1)-N(4) 
2.035(1), N(1)-Al(1)-N(2) 126.4(1). 


benzene solution and in the solid state. At 330 K, however, clean con- 
version to a single species [K{Ph(H)AIl(NON)}] is observed, via formal 
oxidative cleavage of a C-H bond of benzene at Al(1) (Extended Data 
Fig. 7). To our knowledge this represents a first demonstration of the 
intermolecular oxidative addition of a C-H bond in benzene at a single 
well-defined main-group metal centre. Although main-group com- 
pounds that activate benzene by formal deprotonation are known?>?6, 
oxidative cleavage is precedented only for more reactive C-H bonds". 
In summary, we have synthesized and structurally characterized the 
first example, to our knowledge, of an anionic Al(1) (aluminyl) system— 
a compound that offers potential as a nucleophilic reagent in the forma- 
tion of unsupported aluminium-element bonds and in the activation 
of typically inert small-molecule substrates. 
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Any Methods, including any statements of data availability and Nature Research 
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LETTER 


METHODS 


General considerations. All manipulations were carried out using standard 
Schlenk-line or dry-box techniques under an atmosphere of argon or dinitrogen. 
Solvents were degassed by sparging with argon and dried by passing through a col- 
umn of the appropriate drying agent. NMR spectra were measured in benzene-dg 
(which was dried over potassium) or THF-dg (which was dried over LiA]H,), with 
the solvent then being distilled under reduced pressure and stored under argon 
in Teflon valve ampoules. NMR samples were prepared under argon in 5-mm 
Wilmad 507-PP tubes fitted with J. Young Teflon valves. 'H and 3C{'H} NMR 
spectra were recorded on a Bruker Avance III HD NanoBay 400 MHz or Bruker 
Avance III 500 MHz spectrometer at ambient temperature, referenced internally 
to residual protio-solvent (1H) or solvent (°C) resonances and are reported rela- 
tive to tetramethylsilane (6=0 p.p.m.). Assignments were confirmed using two- 
dimensional 'H-'H and !3C-!H NMR correlation experiments. Chemical shifts 
are quoted in 6 (p.p.m.) and coupling constants (J) in Hz. Elemental analyses were 
carried out by London Metropolitan University. 

Starting materials. (NON)H,”’, (Nacnac)**MglI(OEty)** and KA]H,”? were 
prepared according to literature methods. All; and Gal3 were prepared in situ 
by sonicating a mixture of the appropriate metal with 1.5 equivalents of I, in dry 
toluene under an atmosphere of argon until the solution became colourless. All 
other reagents were used as received. 

Syntheses of novel compounds. (NON)AII. To a solution of K,(NON) (2.00 g, 
2.67 mmol) in toluene (30 ml) was added a solution of All; (1.09 g, 2.67 mmol) 
in toluene (30 ml) at -78°C for 15 min. The reaction mixture was slowly warmed 
to room temperature and stirred for a further 16h. The reaction mixture was fil- 
tered and volatiles from the filtrate were removed in vacuo to give an off-white 
solid. This solid was washed with hexane (2 x 20 ml) to give (NON)AII as a white 
powder (1.55 g, 70% yield). X-ray quality crystals of (NON)AII were obtained by 
recrystallizing this solid from warm toluene. 1H NMR (400 MHz, C¢Dg, 298 K): 
6=1.11 (d, Suny =5.8 Hz, 12 H, CH(CH3),), 1.15 (s, 18 H, C(CHs3)3), 1.40 (d, 
Fury =6.0 Hz, 12 H, DippCH(CH3)s), 1.51 (s, 6 H, XA-C(CHs3)s), 3.54 (br., 4H, 
CH(CH3),), 6.39 (s, 2H, XA-o-CH), 6.78 (s, 2H, XA-p-CH), 7.27 (s, 6H, ArH); 
13C{1H} NMR (125.8 MHz, C5D¢): 6= 24.7, 25.6 (CH(CH3)3), 27.4 (XA-C(CH3)3), 
29.6 (CH(CH3)2), 31.6 (C(CH3)3), 35.2 (C(CHs)s), 37.4 (XA-C(CHs)2), 109.1, 
112.6, 124.7, 127.3, 128.0, 128.4, 133.2, 139.2, 140.8, 142.6, 147.0, 149.9 (Ar-C); 
anal. calc. for CyzHg2AlIN20O: C 68.43%, H 7.58%, N 3.40%, found: C 68.36%, 
H 7.68%, N 3.36%. 

[Al(NON)]2. Method A: to a suspension of KCg (0.049 g, 0.364 mmol) in tolu- 
ene (25 ml) was added a solution of (NON)AII (0.300 g, 0.364 mmol) in toluene 
(25 ml) at room temperature. The reaction mixture was stirred for a further 16h 
at room temperature whereupon volatiles were removed in vacuo. The residue 
was extracted with pentane (15 ml), the extract filtered and volatiles removed in 
vacuo to give [Al(NON)] as an off-white solid (0.120 g, 47%). X-ray quality crys- 
tals of [Al(NON)]2 were obtained by crystallizing this solid from hexane. Method 
B: to a solution of [K{Al(NON)}]2 (0.200 g, 0.136 mmol) in toluene (10 ml) was 
added a solution of (NON)AII (0.246 g, 0.298 mmol) in toluene (10 ml) at room 
temperature. The reaction mixture was stirred for a further 16h at room tem- 
perature whereupon volatiles were removed in vacuo. The residue was extracted 
with pentane (15 ml), the extract filtered and volatiles removed in vacuo to give 
[Al(NON)]» as an off-white solid (0.325 g, 86%). 'H NMR (400 MHz, CsDg, 
298 K): 6= —0.12 (d, Jun = 6.8 Hz, 6H, CH(CH3),), 0.11 (d, Ju = 6.9 Hz, 6H, 
CH(CH3)2), 0.66 (d, "Ju = 6.7 Hz, 6 H, CH(CH3)2), 0.75 (d, Ju = 6.3 Hz, 6H, 
CH(CHs3),), 1.09 (d, Ja =6.9 Hz, 6H, CH(CHs3),), 1.15 (s, 36H, C(CH3)3), 
1.16 (d, "Ju = 6.9 Hz, 6H, CH(CH3)2), 1.35 (d, Jun = 6.5 Hz, 6H, CH(CH3)2), 
1.58 (d, Jun = 7.1 Hz, 6H, CH(CHs3)2), 1.65 (s, 6H, XA-C(CHs3)2), 1.81 (s, 6H, 
XA-C(CH3),), 2.01 (sept., "Jur = 6.7 Hz, 2H, CH(CHs3),), 3.02 (sept., Jur = 6.8 Hz, 
2H, CH(CHs3)2), 3.11 (sept., Jun = 6.7 Hz, 2H, CH(CHs3)2), 3.90 (sept., 
Jin = 6.6 Hz, 2H, CH(CH3)2), 6.17 (s, 2H, XA-o-CH), 6.57 (s, 2H, XA-o-CH), 
6.63 (s, 2H, XA-p-CH), 6.68 (s, 2H, XA-p-CH), 6.92-7.32 (m, 12H ArH); 8C{!H} 
NMR (126 MHz, CoDg): 6= 22.2 (CH(CHs)s), 23.2 (XA-CH3), 23.6, 23.8, 24.5, 24.9, 
25.0, 26.9 (CH(CH3)2), 27.2, 29.0 (CH(CH3),), 29.5 (XA-CHs), 29.9 (CH(CHs),), 
30.5 (CH(CH3)s), 30.6 (CH(CHs3)2), 31.6, 31.7 (C(CHs)3), 35.0, 35.1 (C(CHs)3), 
39.6 (XA-C(CH3)2), 108.0, 109.5, 112.7, 114.2, 123.3, 124.1, 124.3, 126.0, 126.1, 
126.6, 137.0, 139.2, 141.2, 142.1, 142.9, 144.5, 145.4, 145.8, 147.2, 147.5, 148.0, 
148.3, 148.7, 150.1 (Ar-C); anal. calc. for Co4Hj24Al,N4O2: c 80.88%, H 8.95%, 
N 4.01%, found: C 80.66%, H 9.06%, N 4.19%. 

[K{Al(NON)}]>. To a suspension of KCg (0.410 g, 3.03 mmol) in toluene (25 ml) 
was added a solution of (NON)AII (1.00, 1.21 mmol) in toluene (25 ml) at room 
temperature. The reaction mixture was stirred for 16h at room temperature pro- 
ducing a colour change from colourless to yellow. The reaction mixture was filtered 
and volatiles from the filtrate were removed in vacuo to give [K{Al(NON)}]2 as a 
bright yellow powder (0.670 g, 76% yield). X-ray quality crystals of [K{Al(NON)}]2 
were obtained by recrystallizing this bright yellow powder from warm benzene. 
1H NMR (400 MHz, CeD,, 298 K): 6= 1.11 (d, 7H = 6.6 Hz, 24H, CH(CH3)2), 


1.16 (d, Ji = 6.6 Hz, 24H, CH(CH3)2), 1.27 (s, 36 H, C(CH3)3), 1.69 (s, 12H, 
XA-C(CH3),), 3.70 (sept., Jun =6.6 Hz, 8H, CH(CHs3),), 6.15 (d, “Ji = 1.6 Hz, 
4H, XA-o-CH), 6.76 (s, “Ju = 1.6 Hz, 4H, XA-p-CH), 6.99-7.14 (m, 12H, ArH); 
BC{'H} NMR (101 MHz, CDe): 6 = 24.9, 25.1 (CH(CH3)3), 28.1 (CH(CH3)2), 
28.9 (XA-C(CHs3)2), 32.1 (C(CH3)3), 35.1 (C(CH3)3), 37.1 (XA-C(CHs3)3), 
106.2, 109.2, 124.1, 125.3, 128.6, 133.2, 142.4, 142.8, 146.8, 147.6, 149.5 (Ar-C); 
IRv(cm"}, Nujol): 1,642(m), 1,584(m), 1,485(m), 1,442(s), 1,419(s), 1,362(m), 
1,321(s), 1,255(m), 1,200(m), 1,175(s), 1,135(m), 1,112(m), 1,102(m), 1,055(m), 
1,044(m), 1,027(m), 1,014(s), 990(m), 943(s), 934(s), 909(m), 863(m), 848(m), 
801(m), 794(s), 773(s), 766(m), 729(m), 675(m), 655(m), 637(m), 620(m), 582(s), 
569(s); anal. calc. for CogHj24Al.K,N4O2: C 76.59%, H 8.48%, N 3.80%, found: C 
76.74%, H 8.32%, N 3.59%. 

(NON)AIH. To a solution of [K{Al(NON)}]> (0.200 g, 0.136 mmol) in toluene 
(10 ml) was added a solution of (NON)H)z (0.092 g, 0.136 mmol) dropwise at 
—78°C. The solution was slowly warmed to room temperature and stirred for a 
further 2h, whereupon volatiles were removed in vacuo. The residue was extracted 
with ice-cold hexane (15 ml), the extract filtered and volatiles removed from the 
filtrate in vacuo to give (NON)AIH as a colourless solid (0.167 g, 88%). X-ray 
quality crystals of (NON)AIH were obtained by crystallizing this solid from 
warm toluene. 'H NMR (400 MHz, CeDg, 298 K): = 1.16 (4, Jur =6.8 Hz, 12H, 
CH(CHs3)3), 1.19 (s, 18 H, C(CH3)3), 1.25 (d, Fun =6.8 Hz, 12 H, CH(CHs3)s), 1.52 
(s, 6 H, C(CH3)s), 3.56 (sept., Ju = 6.8 Hz, 4H, CH(CHs3)2), 4.99 (br., 1H, Aly), 
6.43 (d, “Jun = 1.9 Hz, 2H, XA-o-CH), 6.76 (d, “Ji43 = 1.9 Hz, 2H, XA-p-CH), 
7.27 (s, 6H ArH); ®C{!H} NMR (126 MHz, C6Dg¢): 6 = 24.9, 25.6 (CH(CH3)3), 
27.5 (XA-C(CH3)2), 29.0 (CH(CHs3)2), 31.8 (C(CHs3)3), 35.3 (C(CHs3)3), 37.6 
(XA-C(CH3)2), 108.1, 110.6, 124.6, 127.0, 128.4, 128.6, 133.4, 138.8, 141.1, 142.9, 
147.6, 150.0 (Ar-C); IR v (cm™!, Nujol): 1,875(s, Al-H), 1,640(s), 1,583(m), 
1,418(m), 1,364(s), 1,323(m), 1,310(m), 1,254(s), 1,200(m), 1,175(s), 1,100(m), 
1,015(m), 935(m), 796(m), 727(m), 714(s); anal. calc. for Cy7H¢3AIN20: C 80.76%, 
H 9.08%, N 3.86%, found: C 80.72%, H 9.16%, N 3.94%. 

(NON)AIMe. To a solution of [K{Al(NON)}]2 (0.200 g, 0.136 mmol) in toluene 
(10 ml) was added a solution of Mel (0.046 g, 0.326 mmol) in toluene (10 ml) at 
room temperature. The reaction instantly changed colour from bright yellow to 
colourless upon addition. The mixture was stirred for a further 2 h at room temper- 
ature whereupon volatiles were removed in vacuo. The residue was extracted with 
toluene (15 ml), the extract filtered and volatiles removed from the filtrate in vacuo 
to give (NON)AIMe as an off-white solid (0.184g, 95%). X-ray quality crystals of 
(NON)AIMe were obtained by recrystallizing this solid from warm toluene. 'H 
NMR (400 MHz, CeD¢, 298 K): 6= —0.35 (s, 3H, AICHS), 1.12 (d, Jung = 6.8 Hz, 
12 H, CH(CHs3),), 1.18 (s, 18 H, C(CHs3)3), 1.26 (d, Jun =6.8 Hz, 12 i, CH(CHs3),), 
1.58 (s, 6H, XA-C(CHs3)2), 3.53 (sept., tu 6.8 Hz, 4H, CH(CHs3)2), 6.34 (d, 
‘Jun = 1.8 Hz, 2H, XA-o-CH), 6.76 (s, “Juz = 1.8 Hz, 2H, XA-p-CH), 7.24 (s, 6H, 
ArH); 3C{H} NMR (101 MHz, CeD¢): 6=-12.9 (AICH3), 24.5, 25.8 (CH(CH3)2), 
27.4 (XA-C(CHs3)2), 29.1 (CH(CHs3)2), 31.7 (C(CH3)3), 35.2 (C(CH3)3), 37.6 
(XA-C(CH3)2), 107.9, 111.5, 124.6, 126.7, 133.8, 140.7, 141.7, 143.5, 147.1, 
149.5 (Ar-C); anal. calc. for CygHgsAIN2O: C 80.85%, H 9.19%, N 3.93%, found: 
C 80.97%, H 9.28%, N 3.78%. 

(Nacnac)M“MgAl(NON). To a solution of [K{Al(NON)}], (0.200 g, 0.137 mmol) 
in benzene (15 ml) was added a solution of [(M°SNacnac)Mgl(OEt2)] (0.167 g, 
0.299 mmol) in benzene (15 ml) at room temperature over 5 min. The reaction 
mixture was stirred for 16h at room temperature, producing a colour change from 
yellow to colourless. Volatiles were removed in vacuo, the residue was extracted 
with toluene (20 ml), the extract filtered, concentrated in vacuo (to around 5 ml) 
and slowly cooled to 5°C overnight to give (Nacnac)M°*MgAl(NON) as colour- 
less crystals (0.245 g, 86% yield). 'H NMR (400 MHz, CeDg¢, 298 K): 6= 1.00 
(d, Ter =6.3 Hz, 6H, DippCH(CH3)), 1.10-1.12 (m, 18 H, DippCH(CH3)), 
1.16 (s, 18H, C(CH3)3), 1.39 (s, 3H, NCCH3), 1.58 (s, 6H, o-CH3), 1.60 (s, 3H, 
NCCHs), 1.71 (s, 3H, XA-CHs), 1.76 (s, 3H, XA-CHs), 2.14 (s, 3H, p-CH3), 2.19 
(s, 3H, p-CHs), 2.32 (s, 6H, o-CHs), 3.35 (sept, "Jinn = 6.3 Hz, 2H, DippCH(CHs),), 
3.57 (sept, Jun = 6.4 Hz, 2 H, DippCH(CHs),), 4.86 (s, 1 H, NCCH), 6.19 (s, 2H, 
XA-o-CH), 6.58 (s, 2H, Mes-m-CH), 6.69 (s, 2H, XA-p-CH), 6.84 (s, 2H, 
Mes-m-CH), 7.13-7.18 (m, 6H, ArH); °C{!H} NMR (125.8 MHz, CeDg¢): 6= 18.4, 
19.4 (Mes-o-CHs), 20.9, 21.0 (Mes-p-CHs), 22.5 (XA-CH3), 22.7 23.9 (NCCH3), 
24.0, 25.1 26.2 26.6 (DippCH(CH3),), 28.7, 29.3 (DippCH(CHs),), 31.3 (XA-CH3), 
31.8 (C(CHs)3), 35.1 (C(CHs)3), 37.9 (XA-C(CH3)s), 96.6 (NCCH), 106.5, 110.8, 
123.7 124.6, 125.9, 127.9, 128.2, 129.5, 129.7, 130.8, 132.2, 133.1, 133.5, 134.5, 
141.3, 143.4, 143.5, 143.6, 145.6, 146.5, 148.3, 148.7 (Ar-C), 168.1, 168.6 (NCCH)); 
anal. calc. for C7pH»;AIMgN,O: C 79.63%, H 8.69%, N 5.31%, found: C 79.64%, 
H 8.56%, N 5.14%. 

[K{H2Al(NON)}]>. Method A: to a suspension of K[AIH4] (0.127 g, 1.82 mmol) 
in benzene (10 ml) was added a solution of (NON)AII (0.300 g, 0.364 mmol) in 
benzene at room temperature. The reaction mixture was heated to reflux and 
stirred for a further 48h. After allowing the reaction mixture to cool to room 
temperature, it was filtered and volatiles were removed from the filtrate in 
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vacuo to give [K{H2Al(NON)}]> as a colourless solid (0.258 g, 96%). Method B: 
a solution of [K{Al(NON)}]> (0.200 g, 0.136 mmol) in toluene (10 ml) was pre- 
pared in a 50 mlJ. Young sample flask. The solution was frozen, the atmosphere 
evacuated (5.0 x 10? mbar) and refilled with H, (2.0 bar). The flask was slowly 
warmed to room temperature where it was left to stand for 5 days, producing a 
colour change from yellow to colourless. Volatiles were removed in vacuo to give 
[K{H2Al(NON)}]> as a colourless solid (0.198 g, 99%). X-ray quality crystals were 
obtained by recrystallizing this solid from warm benzene. 'H NMR (400 MHz, 
C5D¢, 298 K): 6= 1.10 (d, Ju = 6.6 Hz, 24H, CH(CH3)2), 1.15 (d, Jury = 6.6 Hz, 
24H, CH(CH3)3), 1.26 (s, 36 i, C(CH3)3), 1.60 (s, 12 H, C(CHs3)2), 3.73 (sept., 
Jun = 6.6 Hz, 8H, CH(CH3)3), 3.88 (br., 4H, AlH>), 6.07 (s, 4H, XA-o-CH), 6.71 
(s, 4H, XA-p-CH), 7.01-7.10 (m, 12H ArH); 3C{!H} NMR (126 MHz, CeDg): 
6=25.1, 25.6 (CH(CHs)s), 28.1 (XA-C(CHs)2), 28.4 (CH(CHs)2), 32.0 (C(CHs)s), 
35.1 (C(CHs)3), 36.8 (XA-C(CHs)2), 106.2, 108.8, 124.7, 125.9, 128.4, 128.6, 131.8, 
140.82, 144.2, 145.9, 147.5, 150.0 (Ar-C); anal. calc. for Co4H2gAlK2N4O2: Cc 
76.38%, H 8.73%, N 3.79%, found: C 76.46%, H 8.58%, N 3.65%. 
[K{Ph(H)Al(NON)}]2. A solution of [K{Al(NON)}]2 (0.200 g, 0.136 mmol) in 
benzene (10 ml) was heated at 60°C for 4 days without stirring. During the 4-day 
reaction, the solution slowly changed colour from yellow to almost colourless, 
and colourless crystals grew on the wall of the flask. The reaction was cooled to 
room temperature, the solution decanted from the flask and the colourless crystals 
dried in vacuo to give [K{Ph(H)Al(NON)}]2 as a colourless powder (0.186 g, 84%). 
'H NMR (400 MHz, THF-dg, 298 K): 5=0.19 (d, Ju = 6.8 Hz, 12 H, CH(CH3)2), 
0.72 (d, Jur = 6.8 Hz, 12 H, CH(CH3)2), 0.95 (d, "Ju = 6.8 Hz, 12 H, CH(CH3)2), 
1.10 (s, 36H, C(CH3)3), 1.25 (d, Jury = 6.8 Hz, 12H, CH(CH3)2), 1.69 (s, 6H, 
C(CH3)s), 1.83 (s, 6 H, C(CH3)s), 3.02 (sept., Jun =6.8 Hz, 4H, CH(CHs3)2), 3.72 
(sept., *Jirr = 6.8 Hz, 4H, CH(CH3),), 5.66 (d, “Ju = 1.9 Hz, 4H, XA-o-CH), 6.43 
(4, Jur = 1.9 Hz, 4H, XA-p-CH), 6.88-7.15 (m, 18H ArH), 7.51 (br., 4H, ArH); 
13C(1H} NMR (126 MHz, CDg): 6 = 23.3 (XA-C(CH3)2), 23.8, 24.9, 25.7, 26.4 
(CH(CHs3)2), 28.0, 28.8 (CH(CHs3)s), 32.0 (C(CHs3)3), 33.9 (XA-C(CHs)2), 35.2 
(C(CHs)3), 36.7 (XA-C(CHs)2), 103.5, 108.8, 123.2, 124.3, 124.6, 125.2, 126.8, 
128.8, 131.0, 138.1, 140.3, 145.4, 146.3, 146.4, 148.6, 150.2 (Ar-C); IRv (cm7!, 
Nujol): 1,636(s), 1,581(s), 1,415(m), 1,360(m), 1,330(m), 1,310(s), 1,254(s), 
1,210(s), 1,112(m), 1,016(m), 903(m), 806(m), 781(s), 725(s), 668(s). 

(NON)Gal. To a solution of K,(NON) (2.00 g, 2.67 mmol) in toluene (30 ml) was 
added a solution of Gal (1.20, 2.67 mmol) in toluene (30 ml) at -78°C for 15 min. 
The reaction mixture was slowly warmed to room temperature where it was stirred 
for a further 16h. The reaction mixture was filtered and volatiles from the filtrate 
were removed in vacuo to give an off-white solid. This solid was washed with 
hexane (2 x 20 ml) to give (NON)Gal as a colourless powder (1.80 g, 78% yield). 
X-ray quality crystals of (NON)GalI were obtained by recrystallizing this solid 
from warm toluene. 'H NMR (400 MHz, C.Dg, 298 K): 6= 1.10 (d, Jur = 6.8 Hz, 
12H, CH(CH3).), 1.16 (s, 18H, C(CHs3)3), 1.34 (d, Jun = 6.8 Hz, 12 H, CH(CHs3)2), 
1.58 (s, 6H, XA-C(CH3)2), 3.48 (sept., Jun =6.8 Hz, 4H, CH(CH3)2), 6.45 (d, 
‘Ju = 1.7 Hz, 2H, XA-o-CH), 6.83 (s, “Ju = 1.7 Hz, 2H, XA-p-CH), 7.22-7.29 
(m, 6H, ArH); °C{tH} NMR (126 MHz, C¢Dg): 6= 24.5, 25.5 (CH(CH3)2), 27.3 
(XA-CHs), 29.3 (CH(CHs)2), 31.7 (C(CHs)3), 35.2 (C(CHs)3), 38.0 (XA-C(CHs3)2), 
109.2, 112.4, 124.7, 128.0, 128.2, 128.4, 134.3, 138.7, 140.7, 141.2, 147.9, 148.9 (Ar-C); 
anal. calc. for CyzH¢2GaIN20: C 65.06%, H 7.20%, N 3.23%, found: C 65.23%, 
H 7.36%, N 3.36%. 

[K{Ga(NON)}]2. To a suspension of KCg (0.103 g, 0.762 mmol) in toluene (10 ml) 
was added a solution of (NON)GalI (0.300 g, 0.346 mmol) in toluene (10 ml) at 
room temperature. The reaction mixture was stirred for 16h at room tempera- 
ture before filtration. The filtrate was concentrated in vacuo to around 5ml and 
slowly cooled to -30°C overnight to give [K{Ga(NON)}]> as yellow/orange crys- 
tals (0.178 g, 66%). 'H NMR (400 MHz, CDg, 298K): 6= 1.04 (, Jun = 6.8 Hz, 
12H, CH(CH3).), 1.11 (d, Jun =6.8 Hz, 12 Hi; CH(CH3).), 1.32 (s, 18H, C(CHs3)3), 
7S (s, 6H, XA-C(CHs3)2), 3.67 (sept., Fy = 6.8 Hz, 4H, CH(CHs3)2), 6.11 
(d, “Jur = 1.6 Hz, 2H, XA-o-CH), 6.78 (s, “Jury = 1.6 Hz, 2H, XA-p-CH), 6.90- 
7.16 (m, 6H, ArH); '°C{'H} NMR (101 MHz, C¢Dg6): 6 = 24.6, 25.7 (CH(CH3)3), 
28.4 (CH(CHs3)2), 28.5 (XA-C(CH3)3), 32.2 (C(CH3)3), 35.0 (C(CH3)3), 37.2 
(XA-C(CH3)2), 105.5, 108.4, 124.1, 124.3, 127.9, 128.2, 133.6, 143.1, 143.2, 146.4, 
148.9, 149.5 (Ar-C); anal. calc. for Co4H24Ga2N4O2: C 72.39%, H 8.01%, N 3.59%, 
found: C 72.07%, H 7.85%, N 3.36%. 

X-ray crystallographic studies. Single-crystal X-ray diffraction data were collected 
using an Oxford Diffraction Supernova dual-source diffractometer equipped with 
a 135mm Atlas CCD area detector. Crystals were selected under Paratone-N oil, 
mounted on Micromount loops and quench-cooled using an Oxford Cryosystems 
open flow N2 cooling device*®. Data were collected at 150 K (unless otherwise 
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stated) using either mirror monochromated Cu K, radiation (A= 1.5418 A; Oxford 
Diffraction Supernova) or Mo K, radiation (A =0.71073 A; Oxford Diffraction 
Supernova). Data collected on the Oxford Diffraction Supernova diffracto- 
meter were processed using the CrysAlisPro package, including unit-cell para- 
meter refinement and inter-frame scaling (which was carried out using SCALE3 
ABSPACK within CrysAlisPro)*!. Equivalent reflections were merged and diffrac- 
tion patterns processed with the CrysAlisPro suite. Structures were subsequently 
solved using SHELXT 2014 and refined on F’ using the SHELXL 2014 package 
and ShelXle or XSeed*3, 

Computational details. All computations reported here were performed at 
the DFT level with Gaussian09 (Revision D.01) program package**. Geometry 
optimizations were performed with the PBE1PBE exchange correlation 
functional**-*”, using def-TZVP basis sets*®, with Grimme’s empirical dispersion 
correction (DFT-D3)*"°. The natural bond orbital (NBO) analyses were per- 
formed using NBO 5.9 as implemented in Gaussian09*!. Graphics were created 
with the program GaussView’”. The geometry optimization calculations were 
performed for model systems (‘Bu and ‘Pr groups replaced by Me) in the case of 
the dimer [K{Al(NON’)}], to reduce computational cost, and full ligand systems 
were used in the calculations of the monomers [Al(NON’)]~ and Al(Nacnac)??P, 
In addition, [Al(NON’)]~ and Al(Nacnac)??P were optimized in the C, point 
group and the dimer [K{Al(NON’)}]> in the C; point group. The nature of 
stationary points found (minimum) was in most cases confirmed by full frequency 
calculations; the optimized structure of the dimer [K{Al(NON’)}]>, however, has 
one imaginary frequency of only 9i cm™! corresponding to the vibration of the 
whole system. Unfortunately, we were unable to find a global minimum with 
several optimization attempts (using tighter convergence criteria, no symmetry, 
or different starting geometries). However, if the dimer was optimized without 
the empirical dispersion correction, a minimum could be found, but the energy 
of the system was found to be almost 600kJ mol higher in energy compared to 
the [K{Al(NON’)}]2 optimized using dispersion correction. This highlights the 
importance of dispersion in these systems. Consequently, we performed the NBO 
analysis to the structure obtained from optimization with dispersion correction. 
Data availability. The spectroscopic data that support the findings of this study 
are available from the corresponding authors upon reasonable request. X-ray crys- 
tallographic data for (NON)AIL [Al(NON)]), [K{Al(NON)}]2, [K{H2Al(NON)}]>, 
[K{Ph(H)Al(NON)}]>, (NON) AIH, (NON)AIMe, (Nacnac)™**MgAl(NON), 
(NON)Gal and [K{Ga(NON)}]2 are available in the Supplementary Information 
and from the Cambridge Crystallographic Data Centre (https://www.ccdc.cam. 
ac.uk/) under reference numbers 1581591-1581600. 
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Extended Data Fig. 1 | Molecular structure of (NON)AII as determined _ ellipsoids have been drawn at the 35% probability level. Key bond lengths 
by X-ray crystallography. Hydrogen atoms have been omitted and (A) and angles (°): Al(1)-I(1) 2.497(1), Al(1)-N(1) 1.846(2), Al(1)-N(2) 
selected carbon atoms shown in wireframe format for clarity; thermal 1.846(2), Al(1)—O(1) 1.967(2), N(1)—Al(1)-N(2) 143.0(1). 
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yd 

\ 
Extended Data Fig. 2 | Molecular structure of [Al(NON)], as Key bond lengths (A) and angles (°): Al(1)—Al(2) 2.646(1), Al(1)-N(1) 
determined by X-ray crystallography. Hydrogen atoms have been 1.902(2), Al(1)-N(2) 1.895(2), Al(2)—N(3) 1.901(2), Al(2)-N(4) 1.900(2), 
omitted and selected carbon atoms shown in wireframe format for Al(1)-O(1) 1.976(2), Al(2)-O(2) 1.981(2), N(1)-Al(1)-N(2) 119.0(1), 
clarity; thermal ellipsoids have been drawn at the 35% probability level. N(3)-Al(2)-N(4) 118.6(1). 
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Extended Data Fig. 3 | Molecular structure of one of the molecules 

in the asymmetric unit of [K{H,Al(NON)}]2 as determined by X-ray 
crystallography. Second (essentially identical) component, benzene 
solvate molecules and carbon-bound hydrogen atoms have been omitted 
and selected carbon atoms shown in wireframe format for clarity; thermal 
ellipsoids have been drawn at the 35% probability level. Key distances 


(A) and angles (°): Al(1)—N(1/2) 1.933(2)/1.921(2), Al(2)-N(3/4) 
1.934(2)/1.917(2), Al(1)-O(1) 2.131(1), Al(2)-O(2) 2.124(2), Al(1)... 
Al(2) 6.356(1), Al(1)...K(1/2) 3.648(1)/4.065(1), Al(2)...K(1/2) 
3.580(1)/4.039(1), Al(1)-H(1 A/1B) 1.69(4)/1.55(4), Al(2)-H(2.A/2B) 
1.71(4)/1.58(4), N(1)-Al(1)-N(2) 130.3(1), N(3)-Al(2)-N(4) 131.1(1). 
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have been measured on samples as Nujol mulls; the blue asterisk highlights 


the AI-H stretching band of [K{H2Al(NON)}]>. 
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Extended Data Fig. 4 | Infrared spectra of [K{Al(NON)}]2 and 
[K{H,Al(NON)}]>. a, [K{Al(NON)}]. b, [K{H2Al(NON)}]>. Both spectra 
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Extended Data Fig. 5 | Molecular structure of [K{Ga(NON)}]2 as bond lengths and distances (A) and angles (°): Ga(1)...Ga(1’) 6.134(1), 
determined by X-ray crystallography. Hydrogen atoms have been Ga(1)...K(1) 3.970(1), Ga(1)...K(1’) 3.784(1), Ga(1)-N(1) 2.093(2), 
omitted and selected carbon atoms shown in wireframe format for clarity; Ga(1)-N(2) 2.106(2), Ga(1)-O(1) 2.542(2), N(1)—Ga(1)-N(2) 126.0(1). 
thermal ellipsoids have been drawn at the 35% probability level. Key 
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Extended Data Fig. 6 | Molecular structure of (NON) AIH-toluene as thermal ellipsoids have been drawn at the 35% probability level. Key bond 
determined by X-ray crystallography. Most hydrogen atoms have been lengths (A) and angles (°): Al(1)-N(1) 1.873(1), Al(1)-N(2) 1.872(1), 
omitted and selected carbon atoms shown in wireframe format for clarity; Al(1)-O(1) 1.944(1), Al(1)-H(1) 1.49(2), N(1)—Al(1)-N(2) 134.1(1). 
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Extended Data Fig. 7 | Molecular structure of [K{Ph(H)AI(NON)}]2 as 35% probability level. Key bond lengths (A) and angles (°): Al(1)-N(1) 
determined by X-ray crystallography. Most hydrogen atoms and benzene _1.945(2), Al(1)-N(2) 1.944(2), Al(1)-O(1) 2.122(1), Al(1)-C(48) 2.007(1), 
solvate molecules have been omitted, and selected carbon atoms shown Al(1)-H(1) 1.82(3), N(1)-Al(1)-N(2) 132.2(1). 

in wireframe format for clarity; thermal ellipsoids have been drawn at the 
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Extended Data Table 1 | Selected X-ray data collection and refinement parameters for complexes prepared in this study 


WON)AlF toluene [AJQNON):]-2hexane-pentane [K{AI(NON)}]2:2benzene (NON)AIH toluene 
formula (CssHyoAlINzO CinFieALN:O2 CrisHicsAbK2N:O2 Css AIN:O 
Ew([g mol] 917.00 1640.41 1786.56 791.10 
crystal system monoclinic monoclinic monoclinic monoclinic 
space. group P2In Pile P2in Pain 
a(A) 15.5890(4) 17.5615(2) 12.7464(4) 15.3319(2) 
b(A) 16.6528(5) 18.5034(2) 24.0329(8) 18.4919(2) 
(A) 19.5742(4) 33.8933(5) 18.0006(5) 17.1480(2) 
a(’) 90 90 90 90 
BC) 101.021(2) 102.156(1) 101.359(3) 104.980(2) 
¥() 90 90 90 90 
v(A) 4987.8(2) 10766.6(2) 5406.2(3) 4696.51(11) 
Zz 4 4 2 4 
radiation, 0 (A) CuK, (1.54184) CuK, (1.54184) CuK, (1.54184) Cu K, (1.54184) 
T(K) 175(2) 200(2) 100(2) 150(2) 
Peas (gem) 1.221 1.012 1.098 1.119 
Gam") 5.491 0.588 1.304 0.662 
reflections collected 57769 64900 31957 27914 
independent reflections 10426 22254 11147 9730 
Parameters 580 1256 696 544 
Riso 0.0501 0.0325 0.0308 0.0273 
RI (Zz 2o(d/all data) 3.84/5.10 7.40/8.91 7.221814 3.98/4.75 
wR2(F*) (I= 2o(D/all data) 9.62/10.61 20.83/22.32 18.23/19.10 10.18/10.80 
GOF 1.017 1.086 1.038 1.030 
CCDC deposition number 1581591 1581595 1581592 1581598 

(NON)AIMe:1.Stoluene (Nacnac)MgAl(NON) [K {H2Al(NON)}]2"3.25toluene 
formula (Cex sH;7AIN:O ‘CroH>: AIMgN:O Cis sHie7sALK2Ne 
Fw[gmol'] 851.20 1055.75 1732.01 
crystal system triclinic, monoclinic triclinic. 
space group P-1 Pale P-1 
a(A) 11.3959(3) 22.3166(5) 19.2554(3) 

b(A) 19.1112(4) 13.6123(3) 20.2365(5) 
(A) 26.1596(4) 22.5503(5) 28.1461(4) 
a(°) 99.313(2) 90 86.636(2) 
BC) 96.019(2) 108.156(2) 89.945(1) 
¥() 106.398(2) 90 72.808(2) 
V(A’) $324.9(2) 6509.3(3) 10457.8(4) 
Zz 4 4 4 
radiation, 4 (A) Cu K, (1.54184) Cu K, (1.54184) CuK, (1.54184) 
T(K) 200(2) 150(2) 100(2) 

Proc (g em’) 1.062 1.077 1.100 

py (mm) 0.614 0.687 1.333 
reflections collected 61949 39799 118805 
independent reflections 22050 13468 43335 
Parameters 1439 748 2867 

Busy 0.0433 0.0505 0.0295 

RI (= 2o(J)/all data) 6.89/8.57 5.35/6.63 5.71/7.06 
wWR2(F*) (I= 2o(D/all data) 19.12/21.49 13.76/15.20 15.43/16.73 
GOF 1.032 1.035 1.017 
CCDC deposition number 1581597 1581593 1581600 

[K{Ph(H)AI(NON)}]:"6benzene (NON)Gal:toluene [K{Ga(NON)}]:-4toluene 
formula Cy2Hi2ALK2N.O2 CsiHroGaINzO Ciz2HissGazK2N:O2 
Ew[g mol") 2098.99 959.74 1928.14 
crystal system triclinic monoclinic triclinic. 
space group P-l P2in P-1 
a(A) 13.9496(7) 15.5521(6) 12.3217(5) 

b(A) 14.3853(7) 16.6088(9) 14.6039(7) 
¢(A) 17.5433(8) 19.5714(7) 17.8128(7) 
a(°) 89.002(4) 90 71.255(4) 
BC) 71.939(4) 100.820(3) 87.125(4) 
¥@) 67.043(5) 90 65.757(4) 
v(A’y 3060.2(3) 4965.5(4) 2755.2(2) 
Zz 1 4 1 
radiation, 4 (A) CuK, (1.54184) CuK. (1.54184) Mo K, (0.71073) 
TK) 100(2) 150(2) 150(2) 
Posie (gem?) 1.139 1.284 1.162 
bam") 1.223 5.916 

reflections collected 32283 29243 31412 
independent reflections 12661 10229 14357 
parameters 704 549 656 

Riso 0.0259 0.0576 0.0456 

R1 (= 2o(d/all data) 4.64/5.63 4.98/6.25 5.45/9.60 
wR2(F*) (Iz 2o(D/all data) 12.16/12.93 12.61/13.87 9.97/11.82 
GOF 1.041 1.036 1.022 
CCDC deposition number 1581599 1581596 1581594 
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Complete Ichthyornis skull illuminates mosaic 
assembly of the avian head 


Daniel J. Field'®”, Michael Hanson!’, David Burnham?, Laura E. Wilson’, Kristopher Super’, Dana Ehret‘, Jun A. Ebersole® & 


Bhart-Anjan S. Bhullar!* 


The skull of living birds is greatly modified from the condition 
found in their dinosaurian antecedents. Bird skulls have an 
enlarged, toothless premaxillary beak and an intricate kinetic system 
that includes a mobile palate and jaw suspensorium. The expanded 
avian neurocranium protects an enlarged brain and is flanked by 
reduced jaw adductor muscles. However, the order of appearance of 
these features and the nature of their earliest manifestations remain 
unknown. The Late Cretaceous toothed bird Ichthyornis dispar 
sits in a pivotal phylogenetic position outside living groups: it is 
close to the extant avian radiation but retains numerous ancestral 
characters!-3. Although its evolutionary importance continues to 
be affirmed*-’, no substantial new cranial material of I. dispar has 
been described beyond incomplete remains recovered in the 1870s. 
Jurassic and Cretaceous Lagerstatten have yielded important avialan 
fossils, but their skulls are typically crushed and distorted?. Here we 
report four three-dimensionally preserved specimens of I. dispar— 
including an unusually complete skull—as well as two previously 
overlooked elements from the Yale Peabody Museum holotype, 
YPM 1450. We used these specimens to generate a nearly complete 
three-dimensional reconstruction of the I. dispar skull using high- 
resolution computed tomography. Our study reveals that I. dispar 
had a transitional beak—small, lacking a palatal shelf and restricted 
to the tips of the jaws—coupled with a kinetic system similar to that 
of living birds. The feeding apparatus of extant birds therefore 
evolved earlier than previously thought and its components were 
functionally and developmentally coordinated. The brain was 
relatively modern, but the temporal region was unexpectedly 
dinosaurian: it retained a large adductor chamber bounded dorsally 
by substantial bony remnants of the ancestral reptilian upper 
temporal fenestra. This combination of features documents that 
important attributes of the avian brain and palate evolved before 
the reduction of jaw musculature and the full transformation of 
the beak. 

The adaptability of the avian skull is manifest in its considera- 
ble functional disparity across living birds'°. However, the earliest 
appearances of several avian cranial innovations and their sequence 
of acquisition are incompletely known, owing to the uneven and 
largely two-dimensional preservation of early ornithuran cranial 
material!*!!!?, We report a newly discovered and nearly complete 
skull of the iconic early ornithuran I. dispar housed at the Sternberg 
Museum of Natural History (FHSM). One of us (K.S.) discovered the 
specimen (FHSM VP-18702) in 2014 near Castle Rock in Gove County 
(Kansas); it derives from the base of lithostratigraphic marker unit 10 in 
the Smoky Hill Member of the Niobrara Formation (Middle Santonian 
stage, Late Cretaceous). 

We have assembled an almost complete three-dimensional recon- 
struction of the skull of I. dispar’? (Figs. 1, 2, Extended Data Figs. 1-4 
and Supplementary Information 1), using the FHSM VP-18702 spec- 
imen, the holotype (YPM 1450) as well as elements from three other 


undescribed specimens (from the Alabama Museum of Natural History 
(ALMNH 3316), the University of Kansas Biodiversity Institute (KUVP 
119673) and the Black Hills Institute of Geological Research (BHI 
6421)), all of which we refer to I. dispar using multiple postcranial auta- 
pomorphies (Supplementary Fig. 1). The skull of I. dispar (Figs. 1, 2) 
illustrates a transitional point in the evolutionary history of birds stem- 
ward of Hesperornithiformes, a phylogenetic position that, after adding 
new characters derived from the present study, we recover consistently 
with the majority of recent analyses'*"* (Fig. 3a; see Supplementary 
Information 1, 2). 

The upper margin of the beak is concave in profile, which is a 
derived condition that is shared by living birds and I. dispar but not by 
stemward ornithuromorphs"! (Fig. 3b, Extended Data Figs. 5-10 and 
Supplementary Videos 1, 5). As in Aves, the fused, toothless premaxillae 
are acutely pointed and have a terminal hook. They occupy only the 
anteriormost quarter of the rostrum; short premaxillae restricted to the 
tip of the snout also characterize stemward ornithuromorphs, such as 
Gansus and Iteravis, and therefore appear to have been the primitive 
form of the avian beak!>-!° (Fig. 3b). Neurovascular foramina indicate 
the presence ofa highly keratinized region of rhamphotheca called the 
premaxillary nail, which would have enveloped and accentuated the 
terminal hook’”. Osteological correlates for the additional rhamphothecal 
plates of extant birds are absent!’; therefore, we infer that the nail 
alone was the original keratinized beak. The ventral surface of the 
premaxillae'® is vaulted dorsally with a median ridge for a soft-tissue 
internarial septum (Figs. 1, 2, 3b, Extended Data Figs. 6, 8 and 
Supplementary Videos 1, 5), which represents a plesiomorphic dino- 
saurian condition that is dissimilar to that of Hesperornithiformes’® 
and Aves”®, in which a flat palatal shelf is covered by rhamphotheca”’. 
The posterior half of the palatal surface is dimpled by three pits on each 
side for lower teeth (Fig. 4b). The portion of the premaxillae rostral to 
these pits would have interacted in a pincer-like fashion with a similarly 
pointed predentary bone’?, 

Previously known only from a small fragment, the maxilla is plesio- 
morphically long (Figs. 1-3 and Supplementary Videos 1, 7). Teeth 
occupy sockets along the entire length of the maxilla with no indica- 
tion of reduction in size relative to stemward Avialae; rows of pits run 
along the palate, medial to these teeth, to accommodate the mandibular 
dentition. Interdental ossifications develop through ontogeny”, which 
represents a primitive condition that is lost in Hesperornithiformes, 
in which small teeth are set in a continuous sulcus?” (Fig. 3b). 
The maxillae are robust in lateral view compared to those of crown 
birds, in which these bones are reduced to flat, predominantly palatal 
elements?+*°. However, I. dispar shares with more crownward avialans 
extensive maxillary shelves that form a bony palate’? (Figs. 1, 2 and 
Supplementary Video 1). 

We discovered a substantially complete lacrimal and nasal in material 
composing the holotype, YPM 1450? (Figs. 1, 2, Extended Data Fig. 10 
and Supplementary Video 4). The lacrimal, which is rarely intact in 
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Fig. 1 | Full 3D reconstruction of the skull of I. dispar. A full 3D reconstruction of the skull of I. dispar is shown on the left. High-resolution scans of 


the right 11th mandibular tooth of YPM 1450 are shown on the right. 


fossil avialans, is similar to that of crown-clade birds!” in being curved 
caudally and perforated by a capacious lacrimal foramen. The frontals, 
nasal and premaxillae interdigitated with one another on the dorsal 
surface of the skull such that—similar to palaeognath birds and unlike 
neognath birds—I. dispar lacked a transverse naso-frontal hinge”**>**. 
A sulcus on the rostral half of the maxilla (Figs. 1, 2 and Extended 
Data Fig. 1) indicates a broad naso-maxillary contact and a corre- 
spondingly broad postnarial bar’’. This condition resembles that in 
Neognathae, which suggests that the narrow or absent contact and bar 
of the Palaeognathae””*>”? are derived within crown birds. 

The palatine, rarely preserved in Mesozoic avialans”®, is narrow and 
elongate (Figs. 1, 2, Extended Data Fig. 6 and Supplementary Video 1), 
and is similar to that of crown birds in being unsutured to the maxilla”. 
The next-most-crownward examples of interpretable palatines are from 
the enantiornithine Gobipteryx minuta*®, in which they are broad and 
flat with extensive maxillary sutures!**°. 

The crown-bird-like palatine of I. dispar was linked by the unpre- 
served pterygoid to a quadrate that is essentially indistinguishable 
from that of crown birds (Figs. 1, 2, Extended Data Figs. 5-7, 10 and 
Supplementary Video 6). As with the quadrates of certain Neognathae 
(Anseriformes, Columbiformes and the Cretaceous Vegavis), the 
quadrate of I. dispar exhibits two rounded capitular condyles that fit 
into cotyles on the prootic and squamosal bones to form a mobile joint 
with the cranium*'*?°4; this raises the possibility that a bicondylar 
morphology is plesiomorphic for Aves*!. Both constituents of the 
quadrato-maxillary bar, the quadratojugal and jugal, are preserved in 
the new material (Figs. 1, 2, Extended Data Figs. 5, 6 and Supplementary 
Video 3). The articular surface of the quadratojugal would have formed 
a mobile joint with the quadrate. The jugal is deep in lateral view, and 


is dissimilar to the rod-shaped jugal of most crown birds. The tandem 
arrangement of the rostrum, jugal, and quadratojugal, the mobile sus- 
pensorium and the narrow, linear palatine all indicate that I. dispar 
possessed a fully functional avian cranial kinetic system, the most 
stemward known occurrence of this key evolutionary innovation”®**”», 

The almost complete postorbital cranium of the FHSM VP-18702 
skull (Figs. 1, 2, 4, Extended Data Fig. 6 and Supplementary Videos 2, 3) 
includes a mesethmoid interorbital septum that terminates at the 
anterior end of the frontals, as is the case in Hesperornithiformes and 
Neognathae but not in Palaeognathae. Thus, the palaeognath condi- 
tion, in which the mesethmoid extends forward to form part of the 
internarial septum, may be autapomorphic!®”>8, 

The endocranial cavity in sagittal section is essentially crown-like 
(Fig. 4a). The forebrain was enlarged and posteroventrally rotated and 
the optic lobes were inflated and laterally shifted, as in living birds and 
the putative ornithurine Cerebavis cenomanica***4, 

The expansive upper temporal fenestra, on the other hand, is unlike 
that of living birds. It is enclosed almost entirely by bone and the 
jaw adductor muscles within would have been substantial (Figs. 1, 
2, 4b). This apparently plesiomorphic configuration is similar to 
that of deinonychosaurs and is entirely unexpected in an avialan that 
is crownward of Enantiornithes. Most extant birds have a reduced 
adductor chamber devoid of external skeletal boundaries, although 
some derived neoavians that capture prey underwater—such as cor- 
morants and penguins—have secondarily enlarged adductor attach- 
ments on the skull roof'*”**>. The anterior margin of the fenestra in 
Ichthyornis is bounded by an extensive postorbital ossification. This 
postorbital process resembles the separate postorbital bone of non- 
avialan dinosaurs in extending laterally and then posteriorly (Fig. 4b 
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Fig. 2 | Line drawings of the skull of I. dispar. Solid lines indicate areas 
known from fossil specimens, dashed lines indicate unknown areas 
reconstructed from other ornithuran birds. an, angular (os angulare); 

ar, articular; bo, basioccipital; bs, basisphenoid; co, coronoid 

(os coronoideum); de, dentary (os dentale); eo, exoccipital; ep, epiotic; fr, 
frontal (os frontale); ju, jugal; la, lacrimal (os lacrimale); 1s, laterosphenoid; 


and Supplementary Video 2); its condition is dissimilar to that of the 
postorbital process of crown-clade birds, which possesses an apex 
that—even when secondarily enlarged or in contact with the squa- 
mosal (as in multiple highly nested clades)*°*°—is directed ventrally, 
at aright angle to the vanished upper temporal bar?:!?737, Because 
the postorbital region of the skull is generally preserved poorly in 
stemward Avialae, it is unclear whether the postorbital ossification in 
Ichthyornis is secondarily enlarged—although, if so, it is enlarged in a 
way that is unseen in all known crown-clade birds and that is mark- 
edly convergent on non-avialan theropods (Fig. 4)—or represents the 
late retention of an ancestral dinosaurian condition. The latter would 
imply evolutionary fusion of the postorbital bone to the skull roof or 
the retention of its shape, perhaps mediated by the persistence of its 
membranous embryonic precursor, despite replacement by ossified 
extensions from the calvarium and the laterosphenoid. The crown- 
bird-like contribution of the laterosphenoid to the postorbital process 
suggests that the associated temporal musculature exhibited a derived 
configuration'®°”, 

The squamosal, which to our knowledge is preserved intact in no 
other described Mesozoic avialan specimen between Archaeopteryx 
and Hesperornis, is peripheral to the braincase, unlike that of extant 
birds*>38, The squamosal exhibits a plesiomorphic, deinonycho- 
saur-like morphology unseen in crownward taxa: the zygomatic pro- 
cess widely encircles the posterior half of the upper temporal fossa, 
first projecting laterally and then curving anteriorly in such a way 
that the process is hooked and directed rostrally to form—with the 
postorbital process—a nearly complete upper temporal bar broken 
only by a small unossified and probably ligamentous gap (Fig. 4b). 
In lateral view, the zygomatic process is deep and triangular. As in 
non-avialan dinosaurs, the nuchal crest along the suture between 
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me, mesethmoid (os mesethmoidale); mx, maxilla; na, nasal (os nasale); 
op, opisthotic; pa, parietal; pl, palatine; pm, premaxillae; po, prootic; 

pr, prearticular (os prearticulare); pt, pterygoid; qj, quadratojugal 

(os quadratojugale); qu, quadrate (os quadratum); sa, surangular 

(os supra-angulare); so, supraoccipital; sp, splenial (os spleniale); 

sq, squamosal; vo, vomer. 


the parietal and occiput extends from the cranial midline onto the 
zygomatic process, forming the upper edge of the squamosal bone 
(Figs. 1, 2, 4b). In its encirclement of the adductor chamber, the squa- 
mosal recalls that of much more stemward, non-ornithuran thero- 
pods (Fig. 4), in which a complete upper temporal arch is present 
as a retention of the ancestral diapsid condition. The nearly com- 
plete upper temporal bar suggests plesiomorphic architecture and 
topology of the muscles attaching to the posterior part of the upper 
temporal fossa, which include the majority of the adductor externus 
complex””. 

All parts of the Ichthyornis skull provide insights into the form and 
function of the ancestral ornithuran head and the transition from 
an early avialan to an avian condition. The pincer-like action of a 
sharp-tipped, toothless beak would have facilitated fine manipula- 
tion and preening—essentially performing the role of a surrogate 
hand as the hands themselves became bound up into wings'*. Holding 
and perforation of prey probably fell to the sizeable, reptilian tooth 
row retained in I. dispar?>. The concurrent appearance of a crown- 
grade avian kinetic apparatus would have enabled further precision 
in grasping and an expanded gape. Simultaneous appearance of the 
beak and kinetic palate is consistent with evidence for a deep mole- 
cular developmental linkage between the fusion of the premaxillary 
beak and the slimming and detachment of the palatines in the roof 
of the mouth*; thus, the kinetic apparatus is functionally and devel- 
opmentally integrated. If the kinetic apparatus was indeed produced 
by a discrete shift in molecular patterning, its evolutionary appear- 
ance could have been fairly abrupt or saltational. Whereas the form 
of the beak supports one embryologically derived hypothesis, the 
coexistence of a primitive adductor chamber and a derived, crown- 
bird-like brain challenges another: the previous suggestion that brain 
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Fig. 3 | Relationships of I. dispar and the origin of the avian beak. Hypothesized extent of rhamphotheca is indicated in grey. Points a-e on 
a, Cladogram showing the phylogenetic position of I. dispar inferred on the rostrum of A. lithographica denote landmarks used for comparative 
the basis of our analyses (abridged and based in part on new codings in measurements presented in Supplementary Information 1. Inset shows the 
multiple datasets; see Supplementary Information 1 for full phylogenetic occlusal gap between the tips of the premaxillae (red) and dentaries (blue) 
results). b, Origin of the avian beak. Line drawings of the rostrum of key in ventral view, inferred to have been occupied by a predentary. 


Mesozoic avialans and crown birds (see a for phylogenetic position). 


Fig. 4 | Derived brain shape and primitive temporal region of I. dispar. 
a, Sagittal cutaway of the braincase of I. dispar revealing the endocranial 
space. Cavities labelled for brain divisions as follows: tc., telencephalon 
(forebrain); te. o., tectum opticum (optic lobe); cb., cerebellum. 

b, Comparative views of the temporal region of a nonavian dinosaur 
(Zanabazar junior), I. dispar and a crown bird (Andean tinamou, 
Nothoprocta pentlandii). Red arrows indicate medial embayment of the 
upper temporal fenestra. 


Zanabazar junior Ichthyornis dispar Nothoprocta pentlandii 
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enlargement drove adductor reduction owing to spatial restriction 
during embryonic development". Finally, the essentially avian brain 
of I. dispar coincides with a flight apparatus resembling that of strong 
fliers among living birds—evolutionary elaboration of the bird brain 
may have been in service to the exigencies of avian flight, the most 
sophisticated and demanding form of locomotion in the history of 
vertebrate life**. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0053-y. 
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METHODS 

I. dispar specimens housed at ALMNH, BHI, FHSM, KUVP and YPM were 
scanned at the University of Texas High-Resolution CT Facility (UTCT) and the 
Center for Nanoscale Systems at Harvard. Scan parameters and specimen details 
are presented in Supplementary Information 1. 

Scanned cranial material was digitally segmented using VGStudio MAX 
3.0, and 3D surface meshes were extracted and imported to MeshLab 2016 for 
optimization. Optimized meshes were then assembled into 3D reconstructions 
using Autodesk Maya 2017. 
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Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Phylogenetic matrices and detailed images of anatomy are 
included as Supplementary Information. Three-dimensional models and data 
are archived and available on request from the Yale Peabody Museum of Natural 
History. All other data are available from the corresponding author upon reason- 
able request. 
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Extended Data Fig. 1 | Full 3D reconstruction of the skull of I. dispar in high resolution. This is the same reconstruction as shown in Fig. 1, 
reproduced at a higher resolution to show details. 
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Extended Data Fig. 2 | Reconstruction of the skull of I. dispar. Material described in this paper is indicated in gold and previously described regions 
are indicated in grey. All elements are scaled to the size of the FHSM VP-18702 specimen. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


FHSU 18702 

YPM 1728 

YPM 1450 FHSU 19702 
YPM 1450 
YPM 1728 


YPM 1728 


YPM 1450 
YPM 1459 \ 


FHSU 18702 
YPM 1459 
FHSU 18702 


YPM 1459 


FHSU 18702 
BHI 6421 
KUVP 119673 
ALMNH PV93.2.133 ’ = : j : Sig eo, FHSU 18702 
FHSU 18702 a | i —= a SF BHI 6421 
FHSU 18702 = ita ae 
YPM 1450 ‘ 
FHSU 18702 
| yem 1450 rt 8421 
1450 KUVP 119673 
YPM 1450 YPM 1450 YPM eo BHI6421  YpM 1775 
YPM 1450 YPM 1450 ALMNH PV93.2.133 ALMNH PV93.2.133 BHI 6427 FHSU 18702 
FHSU 18702 ALMNH PV93.2.133 FHSU 18702 FHSU 18702 YPM 1450 FHSU 18702 KUVP 119673 
YPM 1775 FHSU 18702 YPM 1749 YPM 1735 FHSU 18702 KUVP 119673 YPM 1761 
YPM 1775 YPM 1749 YPM 1749 
YPM 1750 YPM 6264 
FHSU 18702 
YPM 1728 
YPM 1450 FHSU 19702 
YPM 1450 
YPM 1728 
YPM 1459 FHSU 18702 
YPM 1459 FHSU 18702 
FHSU 18702 YEM 100 
FHSU 18703 1459 
FHSU 18702 
ALMNH PV93.2.133 
FHSU 18702 yO CC#FHSU 19702 
FHSU 18702 . ae Mai Hes 
YPM 1450 . 


FHSU 18702 

BHI 6421 
KUVP 119673 

YPM 1450 YPM 1775 

BHI 6421 

FHSU 18702 


FHSU 18702 KUVP 119673 
YPM 1450 PMN ovea.2.13g YPM 1450 voniaaeo YEMI1450 BHI6421 YPM 1761 
YPM 1450 ALMNH PV93.2.133 Peiunnee ALMINH PV93.2.133 ergy 18702 BH G21 KUVP 119673 YPM 6264 
FHSU 18702 FHSU 18702 vehi irae FHSU 18702 Fee? vem i4so FSU 18702 
YPM 1775 YPM 1775 VPM760 YPM 1735 YPM 1749 -FHSU 18702 KUVP 119673 
YPM 1749 
FHSU 18702 FHSU 18702 BHI 6424 FHSU 18702 
YPM 1728 BHI 6421 BHI 6421 
YPM 1450 KUVP 119673 KUVP 119673 
ALMNH PV93.2.133 YEN 1a FHSU 18702 TEM ATS: 
FHSU 18702 FHOU' 18702 YPM 1450 yp 1450 


YPM 1459 — 
FHSU 18702 YPM 1459 | > | 
vA — eee | ? fy —— FHSU 18702 


= “4 , FHSU 19702 
oc : t pe YPM 1450 
| ; ? YPM 1728 


She , FHSU 18702 
FHSU 18702 - ra 


FHSU 18702 

YPM 1459 FHSU 18702 ypry 4459 BH e421 
ALMNH PV93.2.133 pe prt YEM #458 YPM ITE 
FHsUaeibe FHSU 18702 FHSU 18702 YPM 1450 


YPM 1450 


FHSU 18702 WA“ FHSU 18702 
FHSU 18702 54) 6427 BHI 6421 BHI 6421 
any eas KUVP 119673 KUVP 119673 
YPM 1775 


a GrHsu MJ FHsuayepm ' {MM BHI, FHSU,KU,YPM_ ‘(J FHSU, KU& YPM ___ [J ALMNH, FHSU, & YPM 
em Myem sui Hi ALMNH & FHSU Gl FHsu & ku 


Extended Data Fig. 3 | Reconstruction of the skull of I. dispar specimen. Specimen numbers in bold are those used in the reconstruction. 


indicating the material represented by every known Ichthyornis Numbers in italics indicate preservation of the same element in additional 
specimen. All elements are scaled to the size of the FHSM VP-18702 specimens. 
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Extended Data Fig. 4 | High-resolution line drawing of the skull of Fig. 2. Solid lines indicate areas known from fossil specimens, and 


I. dispar. This the same image as shown in Fig. 2, reproduced at a larger dashed lines indicate unknown areas reconstructed from other ornithuran 
size to show details. All anatomical abbreviations are as indicated in birds. 
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Extended Data Fig. 5 | Skull and jaw elements of I. dispar specimen BHI 6421. 
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Extended Data Fig. 6 | Skull and jaw elements of I. dispar specimen FHSM VP-18702. 
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Extended Data Fig. 7 | Skull and jaw elements of I. dispar specimen KUVP 119673. 
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Extended Data Fig. 8 | Skull and jaw elements of I. dispar specimen ALMNH 3316. 
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Extended Data Fig. 9 | Skull and jaw elements of I. dispar holotype YPM 1450 showing the nasal and lacrimal elements that have not previously 
been reported. 
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Extended Data Fig. 10 | Skull and jaw elements of I. dispar specimens YPM 1728, YPM 1459, YPM 1775 and YPM 1749. 
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Deep mitochondrial origin outside the sampled 


alphaproteobacteria 


Joran Martijn', Julian Vosseberg!, Lionel Guy’, Pierre Offre*+ & Thijs J. G. Ettema!* 


Mitochondria are ATP-generating organelles, the endosymbiotic 
origin of which was a key event in the evolution of eukaryotic cells'. 
Despite strong phylogenetic evidence that mitochondria had an 
alphaproteobacterial ancestry”, efforts to pinpoint their closest 
relatives among sampled alphaproteobacteria have generated 
conflicting results, complicating detailed inferences about the 
identity and nature of the mitochondrial ancestor. While most 
studies support the idea that mitochondria evolved from an ancestor 
related to Rickettsiales*-°, an order that includes several host- 
associated pathogenic and endosymbiotic lineages'”'', others have 
suggested that mitochondria evolved from a free-living group'*"\*. 
Here we re-evaluate the phylogenetic placement of mitochondria. 
We used genome-resolved binning of oceanic metagenome datasets 
and increased the genomic sampling of Alphaproteobacteria with 
twelve divergent clades, and one clade representing a sister group 
to all Alphaproteobacteria. Subsequent phylogenomic analyses that 
specifically address long branch attraction and compositional bias 
artefacts suggest that mitochondria did not evolve from Rickettsiales 
or any other currently recognized alphaproteobacterial lineage. 
Rather, our analyses indicate that mitochondria evolved from a 
proteobacterial lineage that branched off before the divergence of 
all sampled alphaproteobacteria. In light of this new result, previous 
hypotheses on the nature of the mitochondrial ancestor®!»!* should 
be re-evaluated. 

Mitochondria are important organelles in eukaryotic cells and are 
involved in various processes, of which ATP generation through oxida- 
tive phosphorylation is a hallmark feature. The endosymbiotic origin 
of these organelles represents a strongly debated step in eukaryotic 
evolution!” that has contributed to the emergence of the cellular com- 
plexity that characterizes modern eukaryotes’. To trace the evolution- 
ary history of mitochondria and their role in eukaryogenesis, detailed 
knowledge about the identity and nature of the mitochondrial ancestor 
is of great importance. However, despite the fact that the alphaprote- 
obacterial origin of mitochondria is generally undisputed, efforts to 
resolve the phylogenetic position of mitochondria in the alphaprote- 
obacterial species tree have failed to reach consensus owing to several 
complications. First, because mitochondria and several independent 
alphaproteobacterial lineages have been subjected to accelerated rates 
of evolution, they are sensitive to long branch attraction (LBA) artefacts 
in which fast evolving lineages incorrectly group together?*!®, Second, 
because alphaproteobacterial and mitochondrial sequences display a 
high degree of compositional heterogeneity, they are sensitive to the 
compositional bias artefact in which unrelated lineages with similar 
sequence compositions falsely group together>*!>””. Finally, the cur- 
rent sample of alphaproteobacterial genomes is biased towards taxa that 
are clinically or agriculturally relevant or can be cultivated in a labora- 
tory setting. It does not reflect the natural diversity of extant alphapro- 
teobacteria and might exclude potential close relatives of mitochondria. 
Here we attempt to resolve the phylogenetic origin of mitochondria 
by addressing the issues of long branch attraction, compositional bias 


and biased taxon sampling simultaneously. We tried to remove these 
phylogenetic artefacts by applying models of sequence evolution that 
account for site-specific substitution patterns, by reducing compo- 
sitional heterogeneity of molecular sequence data and by increasing 
taxon sampling in a more unbiased manner through genome-resolved 
metagenomic binning. 

We screened all publicly available Tara Oceans metagenomic data- 
sets”! for novel alphaproteobacteria residing in the ocean's upper layers 
(see Methods) and selected three datasets corresponding to samples 
taken at 5m, 115m and 140 m depths in the Pacific Ocean (Fig. 1a). To 
also capture novel alphaproteobacteria residing in the deeper layers, we 
screened four metagenomic datasets” originating from samples taken 
at 100m, 776m, 2,745 m and 5,002 m depths in the Atlantic Ocean 
(Fig. 1a). 

We assembled each dataset with recently developed metagenome 
assemblers”*4 and performed a phylogenomic analysis of all contigs 
that contained at least five out of fifteen ribosomal proteins located in 
the str-spc cluster*° (RP15). Collectively, the seven metagenomes con- 
tained a large diversity of alphaproteobacteria (Supplementary Fig. 1). 
In particular, they harboured several novel divergent lineages poten- 
tially useful for resolving the phylogenetic position of mitochondria 
(Fig. 1b and Supplementary Fig. 2). 

We reconstructed genomes from these lineages by binning metagen- 
omic contigs into metagenome-assembled genomes (MAGs) based on 
their differential sequence coverage across samples, tetranucleotide 
frequencies and read-pair linkage information. Forty-five MAGs were 
reconstructed that exhibited high completeness and low redundancy 
estimates, collectively representing twelve distinct alphaproteobacte- 
rial lineages (designated MarineAlphal-12; Fig. 2 and Supplementary 
Table 1) and one additional proteobacterial lineage that forms a sister 
clade to the Alphaproteobacteria (designated MarineProteol; Fig. 2). 
The genome sample of the obtained MAGs is extensive, as they covered 
most divergent groups observed in the RP15 phylogeny (Fig. 1b and 
Supplementary Fig. 2). None of the MAGs contained full-length 16S 
rRNA genes, a commonly observed issue*°. This prevented us from 
linking these lineages to alphaproteobacteria that have been previ- 
ously identified in 16S rRNA environmental surveys. The exception 
was MarineAlpha12, which lacks close relatives in the SILVA data- 
base (release 128; https://www.arb-silva.de/) to give it a meaningful 
taxonomic classification (‘uncultured Rhodospirillaceae’ at 92.9% 
similarity). 

Next, we determined the phylogenetic positions of the MAGs. 
We performed phylogenomic analyses on a concatenated dataset 
comprising 72 carefully selected genes that are conserved across 
Alphaproteobacteria (Supplementary Table 2). Bayesian and 
maximum-likelihood phylogenetic inferences of this dataset generally 
yielded species trees in which all AT-rich (<40% GC), long branch 
taxa (Rickettsiales, Pelagibacteraceae, alphaproteobacterium HIMB59, 
MarineAlpha2, 5-8 and MarineAlpha9 bins 1-4) formed a monophyl- 
etic group with maximum statistical support (Supplementary Figs. 3, 4). 
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Fig. 1 | Metagenomic exploration of oceanic alphaproteobacteria. 
Locations, depths and alphaproteobacterial phylogenetic diversity of 

the seven environmental samples. a, Sample depths and approximate 
locations. Coordinates (latitude, longitude) of the Atlantic Ocean samples 
were as follows: 100 m and 5,002 m: 10.91°, 315.33° (red); 776 m: 24.51°, 
325.8° (blue); 2,745 m: 20.12°, 319.16° (black). Coordinates of the Pacific 


Such a topology strongly suggests that, despite the use of mixture mod- 
els of evolution, these analyses suffered from LBA and compositional 
bias artefacts. Indeed, posterior predictive tests of the Bayesian analysis 
confirmed that the used model of evolution failed to adequately capture 
the level of compositional heterogeneity in the dataset (Supplementary 
Fig. 3 and Supplementary Table 3). 

To ameliorate compositional bias artefacts, we reduced the com- 
positional heterogeneity of the dataset by either recoding the data 
from a 20-character to a 4-character state’, or by removing heter- 
ogeneous sites until all pairwise taxa combinations were considered 
homogeneous with a stationary-based trimmer’. Phylogenetic 
analyses of the compositionally homogenized datasets appeared 
to be free of compositional bias artefacts: the Pelagibacteraceae, 
Rickettsiales, MarineAlpha9 bins 1-4 and MarineAlpha2 bin 1 each 
branched in distinct areas of the tree, away from alphaproteobacte- 
rium HIMBS59 and MarineAlpha5-8 (Fig. 3, Supplementary Figs. 5-8 
and Supplementary Discussion). Posterior predictive tests of Bayesian 
analyses of the compositionally homogenized datasets indicated that 
the evolutionary model was either substantially closer to capturing 
the level of compositional heterogeneity or was able to capture it ade- 
quately (Supplementary Table 3). Reducing heterogeneity alleviates 
artefacts, but also reduces the informational content, which in turn 
leads to a decreased statistical support for some of the deeper nodes of 
the trees (Fig. 3 and Supplementary Figs. 5-8). Notably, the phyloge- 
netic analyses of the compositionally homogenized datasets reveal a 
species tree in which the MAGs represent clades that do not branch 
closely with any of the previously established groups. Therefore, the 
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Reference taxa 
Sample taxa 
@ Binned taxa 


Ocean samples can be found in Supplementary Table 6. b, Phylogenetic 
diversity of sampled alphaproteobacteria compared to a reference set 

of alphaproteobacteria and mitochondria based on an RP15 analysis 

(see Methods). Branches that have been shortened for clarity purposes are 
dotted. Caulob., Caulobacterales; Rhodob., Rhodobacterales. For a fully 
expanded tree see Supplementary Fig. 2. 


addition of the MAGs captures a larger fraction of the alphaproteo- 
bacterial diversity. 

Using this improved genome sample, we re-evaluated the origin of 
mitochondria. To do so, we assembled a phylogenomics dataset that 
consisted of 24 genes, which are conserved in Alphaproteobacteria 
and in a set of gene-rich mitochondrial genomes from diverse 
eukaryotes. This dataset was analysed by using the abovementioned 
approaches to recover phylogenies minimally affected by phylogenetic 
artefacts. 

Phylogenetic inference from the untreated and therefore composi- 
tionally heterogeneous dataset yielded trees in which mitochondria 
branched together with other AT-rich and long branch taxa with 
high branch support (Supplementary Figs. 9, 10), suggesting that the 
topology was affected by compositional bias artefacts. Indeed, poste- 
rior predictive tests indicated that the model was unable to adequately 
capture the level of compositional heterogeneity (Supplementary 
Table 3). As shown for the alphaproteobacterial species tree, reducing 
the compositional heterogeneity reduced the number of artefactual 
phylogenetic relationships (Fig. 4a and Supplementary Figs. 11-13) 
and improved posterior predictive test results (Supplementary Table 3) 
at the expense of phylogenetic signal. For phylogenetic reconstructions 
in which the model was not able to capture heterogeneity adequately 
(Supplementary Table 3) (using the untreated and recoded datasets), 
mitochondria branched with Rickettsiales (Rickettsiales-sister) with 
moderate to high support (Supplementary Figs. 9, 13). By contrast, 
reconstructions in which the model was able to capture heterogene- 
ity adequately (Supplementary Table 3) (using the stationary-based 
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Fig. 2 | Increased genomic sampling of alphaproteobacterial clades. 
Overview of all genomic bins obtained in this study. The subtrees shown 
on the left are scaled and are based on the tree shown in Fig. 3. ‘A~’ and 
‘P-’ denote Atlantic and Pacific samples, respectively. Phylogenetically 


trimmed dataset), mitochondria branched as a sister group to all alp- 
haproteobacteria (not including Magnetococcales; Supplementary 
Discussion) (‘Alphaproteobacteria-sister’) with moderate support 
(Fig. 4a and Supplementary Fig. 11). Similarly, approximately unbiased 
tests favoured a Rickettsiales-sister topology when using the untreated 
dataset, and favoured an Alphaproteobacteria-sister topology when 
using the stationary-based trimmed dataset (Supplementary Table 4). 
However, it should be noted that the approximately unbiased test was 
unable to reject the Rickettsiales-sister topology for the stationary- 
based trimmed dataset (Supplementary Table 4). These analyses sug- 
gest that the frequently cited Rickettsiales-sister hypothesis may be the 
result of a compositional attraction?®*” and instead point towards an 
Alphaproteobacteria-sister hypothesis. 

The support for Alphaproteobacteria-sister in the Bayesian analysis 
was only moderately high (posterior probability support (PP) =0.89). 
We postulated that the loss of phylogenetic information that accompa- 
nies stationary-based trimming was responsible for the lack of strong 
branch support. We therefore applied an alternative method (7 trim- 
mer; see Methods) that removes variable amounts of heterogeneous 
sites, and so allows one to retain more information. When we used the 
? trimmer to remove the 20% most heterogeneous sites (versus approx- 
imately 42% by the stationary-based trimmer), we recovered a similar, 
but generally more highly supported topology in which the support for 
the Alphaproteobacteria-sister was stronger (PP = 0.97; Supplementary 
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fe 
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‘identical’ bins used to build composite bins are shaded by pink boxes. Bars 
indicating the estimated redundancy (red) follow the same scale as bars 
indicating bin size (dark blue) and estimated genome size (light blue). 


Fig. 14). Although the model did not capture the heterogeneity of the 
x?-trimmed data adequately, it was substantially closer than in the anal- 
ysis of the untreated data (Supplementary Table 3). Removing fewer 
heterogeneous sites thus maintained more phylogenetic information 
and was sufficient to reduce major compositional bias artefacts. The 
weaker support for the Alphaproteobacteria-sister hypothesis was there- 
fore most likely caused by the loss of phylogenetic information. 

To further assess the Alphaproteobacteria-sister and Rickettsiales- 
sister hypotheses, we investigated the influence of compositional 
heterogeneity on the placement of the mitochondria by progressively 
reducing compositional heterogeneity of the untreated dataset with 
the y” trimmer and inferred maximum-likelihood phylogenies. We 
evaluated the support for Rickettsiales-sister and Alphaproteobacteria- 
sister hypotheses along the heterogeneity gradient in terms of non- 
parametric bootstrap measures and P values of the approximately 
unbiased test. The support for Rickettsiales-sister dropped whereas 
support for Alphaproteobacteria-sister rose strikingly as heterogene- 
ity decreased (Fig. 4b). In addition, we inferred maximum-likelihood 
phylogenies from one additional dataset that included a wider diversity 
of mitochondria and two additional concatenated alignments of differ- 
ent gene sets (Supplementary Table 2). In the first gene set, we relaxed 
our gene selection criterion and added 14 mitochondrial-encoded 
genes that displayed a lesser degree of conservation compared to the 24 
originally selected genes (see Methods). The second gene set comprised 
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Fig. 3 | Updated alphaproteobacterial tree. Phylogenetic placement of 
all genomic bins relative to representatives of the genomically sampled 
alphaproteobacteria. Phylogenetic tree based on the stationary-trimmed 
alignment derived from the concatenation of 72 genes conserved across 
Alphaproteobacteria (Supplementary Table 2), inferred with PhyloBayes 
MPI under CAT + GTR+I4. PP, posterior probability support; NPB, 
non-parametric bootstrap support; UFB, ultrafast-bootstrap support. 
IQTREE was used to calculate NPB under the PMSF approximation of 
LG+C60+F+T4 and UFB under LG+ C60+F+T4. The tree is rooted 
with representatives of Beta-, Gammaproteobacteria and Magnetococcales 
(not shown). For a fully expanded tree, including taxon-to-clade 
specifications, see Supplementary Fig. 6. 


29 nuclear-encoded genes that are thought to have been transferred 
from the mitochondrial genome to the nuclear genome early on in 
eukaryote evolution and may therefore have escaped the higher evo- 
lutionary rates and AT-biases that mitochondrial genomes have been 
generally subjected to. Thus, these nuclear-encoded genes are in theory 
less sensitive to phylogenetic artefacts than mitochondrial-encoded 
genes*. Maximum-likelihood trees inferred from the increased mito- 
chondrial diversity dataset and relaxed mitochondrial-encoded gene set 
displayed the same behaviour as the stringent mitochondrial-encoded 
gene set, in which support for the Rickettsiales-sister was replaced by 
support for the Alphaproteobacteria-sister upon reduction of composi- 
tional heterogeneity (Supplementary Figs. 15, 16, 22, 23, Supplementary 
Table 4 and Supplementary Discussion). Similarly, maximum- 
likelihood trees inferred from the untreated nuclear-encoded gene set 
recovered an Alphaproteobacteria-sister topology, despite the artificial 
grouping of all other AT-rich taxa (Supplementary Fig. 17). In addi- 
tion, approximately unbiased tests rejected Rickettsiales-sister under 
both LG + C60 models and its PMSF approximation (Supplementary 
Table 4). Maximum-likelihood trees inferred from the composition- 
ally homogenized alignment of nuclear-encoded genes also recovered 
an Alphaproteobacteria-sister topology, while the AT-rich taxa no 
longer branched together (Supplementary Fig. 18). However, com- 
pared to maximum-likelihood trees inferred from the composition- 
ally homogenized alignment of mitochondrial-encoded genes, branch 
supports for the Alphaproteobacteria-sister were lower, and approx- 
imately unbiased tests failed to discriminate between Rickettsiales- 
and Alphaproteobacteria-sister hypotheses (Supplementary Table 4). 
We hypothesize that this could be the result of a lower phylogenetic 
signal in the compositionally homogenized alignment of nuclear- 
encoded genes relative to the compositionally homogenized alignment 
of mitochondrial-encoded genes. The nuclear-encoded gene set had 
more missing data among eukaryotic taxa (30% versus 12%), displayed 
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Fig. 4 | An early-branching mitochondrial ancestor. Phylogenetic 
placement of the mitochondria relative to representatives of the 
genomically sampled Alphaproteobacteria and genomic bins. 

a, Phylogenetic tree based on the stationary-trimmed alignment derived 
from the concatenation of 24 genes conserved across Alphaproteobacteria 
and gene-rich mitochondria (Supplementary Table 2), inferred with 
PhyloBayes MPI under CAT + GTR+ 4. IQTREE was used to 

calculate NPB under the PMSF approximation of LG+ C60+F+I4 

and UFB under LG+ C60+F+T4. Tree is rooted with representatives 

of Beta-, Gammaproteobacteria and Magnetococcales (not shown). 

For a fully expanded tree, including taxon-to-clade specifications, see 
Supplementary Fig. 11. b, Evaluation of non-parametric bootstrap (top) 
and approximately unbiased (AU) test P value (bottom) support under the 
PMSF approximation of LG + C60 + F +1 for Rickettsiales-sister and 
Alphaproteobacteria-sister hypotheses, as sites that contribute most to 
overall compositional heterogeneity (total ,? score) are removed. 


relatively higher substitution rates among eukaryotic taxa (average 
root-to-tip substitution rate: 1.280 versus 1.083), and probably carries 
a less coherent signal because it consists of more biochemically inde- 
pendent genes (Supplementary Table 2). 

The Alphaproteobacteria-sister topology may be the result of an LBA 
between mitochondria and outgroup taxa. Indeed, our strategies do 
not directly accommodate heterotachy, and can lead to LBA. We car- 
ried out three independent analyses (outgroup removal, parametric 
simulations and random sequence replacements) and conclude that 
outgroup taxa are unlikely to have artificially attracted the mitochon- 
dria (Supplementary Discussion). 
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In summary, we have obtained genome data for thirteen alphapro- 
teobacteria-related clades and re-evaluated the phylogenetic origin 
of the mitochondria in the context of this expanded coverage of alp- 
haproteobacterial diversity. While none of the newly identified line- 
ages branched specifically with mitochondria, we observed that the 
improved taxonomic sampling in combination with methods that 
address long branch attraction and compositional bias artefacts resulted 
in a new phylogenetic position for mitochondria. Importantly, we show 
that a Rickettsiales-sister origin, as reported in previous studies, is most 
likely the result of a compositional bias artefact. Our analyses instead 
suggest that mitochondria diverged from Alphaproteobacteria before 
the diversification of all currently known alphaproteobacterial line- 
ages. We infer that Rickettsiales and mitochondria evolved via two 
independent endosymbiosis events, instead of the single shared endo- 
symbiosis event’ that is suggested in studies that recover a Rickettsiales— 
mitochondria affiliation®. An independent-endosymbioses scenario 
is more probable than the shared-endosymbiosis scenario: the latter 
requires the unlikely event of a hypothetical endosymbiotic ancestor 
of Rickettsiales and mitochondria escaping the host cell before engag- 
ing a new endosymbiosis that gives rise to the Rickettsiales*. In light 
of this new hypothesis, previous inferences about the nature of the 
mitochondrial ancestor that were based on a Rickettsiales origin, such 
as the requirement of pathogen-like features to survive the host cell!°, 
the presence of flagella, cbb3-type oxidases®, bd-type quinol oxidase 
and ATP/ADP translocase’® will have to be re-evaluated. Finally, our 
hypothesis stands in direct contrast to all previous hypotheses that 
reported an origin within the Alphaproteobacteria**!?"4, and implies 
that mitochondrial endosymbiosis may be more ancient than previ- 
ously thought. Our study underscores that future efforts that aim to 
identify the sister group of mitochondria should harness the power 
of cultivation-independent methods to explore alphaproteobacterial 
diversity in an unbiased manner. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0059-5. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Sources of metagenomic data. Metagenomes of the four Atlantic Ocean samples 
were previously published” (Sequence Read Archive (SRA) accession number 
SRP081826). All 243 metagenome assemblies from a previously published study”! 
were downloaded from EBI (http://www.ebi.ac.uk/ena/about/tara-oceans- 
assemblies; Supplementary Table 5). The raw sequence data from all selected 
Tara Oceans samples (Supplementary Table 6) were downloaded from project 
ERP001736 on the EBI Metagenomics portal. 

RP15 pipeline. In brief, the RP15 pipeline aims to identify contigs within one or 
more metagenome assemblies that contain between five and fifteen ribosomal 
proteins that are typically encoded within the same str-spc cluster*’. Once a set of 
such ‘ribocontigs’ has been identified, their ribosomal proteins are aligned to the 
orthologous ribosomal proteins from a set of reference taxa. The separate align- 
ments are then trimmed and concatenated into a supermatrix that is subsequently 
used to infer a phylogenetic tree. Because ribosomal proteins are good phyloge- 
netic markers and bear more information than ribosomal RNA genes, the tree 
gives the user an estimate of the phylogenetic position of each ribocontig, and 
therefore of the overall phylogenetic diversity that can be found in the considered 
metagenome(s). 

Open reading frames were predicted with Prodigal v.2.6.3°!, ribosomal proteins 
were detected with PSI-BLAST v2.3.0-+* using aligned ribosomal proteins from 
reference taxa as a query and aligned with MAFFT v.7.050b* using the L-INS-i 
algorithm. Alignments were trimmed with trimAl v.1.4*4, and phylogenetic trees 
were inferred with RAxML v8.2.8°° (LG +I model) or FastTree v.2.1.9°° (LG 
model). The reference taxa were either a set of 90 phylogenetically diverse bac- 
teria and archaea (the ‘bacteria backbone’ based on Raymann et al.>”), or a set of 
phylogenetically diverse alphaproteobacteria (74), magnetococcales (2), betap- 
roteobacteria (4) and gammaproteobacteria (4) and mitochondria (12: Naegleria 
gruberi, Phytophthora infestans, Tetrahymena thermophila, Monosiga brevicollis, 
Malawimonas jakobiformis, Hemiselmis andersenii, Glaucocystis nostochinearum, 
Dictyostelium discoideum, Cyanidioschyzon merolae, Chlorokybus atmophyticus, 
Bigelowiella natans and Andalucia godoyi) (the ‘alphaproteobacteria backbone’). 
Metagenome assembly and binning of Atlantic Ocean samples. Each sequenc- 
ing run of the four samples was processed with Trimmomatic v.0.33"* to, in 
the stated order, trim read-through adapters (ILLUMINACLIP:TruSeq2-PE. 
fa:2:30:10:1:true), trim low quality base calls at the starts and ends of reads 
(LEADING: 3, TRAILING: 3), remove reads shorter than 60 bp (30 bp for the 
5,000-m sample) (MINLEN:60) and finally remove reads that have an average 
phred score lower than 28 (100-m sample: 30) (AVGQUAL:28). The overall quality 
and presence of adaptor sequences in the unprocessed and processed reads was 
evaluated with FastQC v.0.11.4*”. 

Read pairs for which both members passed the preprocessing were assembled 
with IDBA-UD v.1.1.1%, using the k-mers 20, 40, 60, 80 and 100. 

Upon assembly, all contigs were split in 10-kb pieces, unless the remaining 
fragment was smaller than 20kb, with the CONCOCT™ script cut_up_fasta.py. 
Then preprocessed reads of each of the four samples were mapped onto these 
split-contigs with the CONCOCT script map-bowtie2-markduplicates.sh (Bowtie2 
v.2.1.0). Its output was transformed into a matrix containing differential cover- 
age profiles for each split-contig using the CONCOCT script gen_input_table.py. 
CONCOCT v.0.40 and we then clustered the split-contigs larger than 1 kb into bins 
using their tetranucleotide frequency profiles and differential coverage profiles. 
Bins that corresponded to potential novel alphaproteobacteria were identified by 
checking which bins contained alphaproteobacterial ribocontigs (according to the 
RP15 tree). For each selected bin, we highlighted the location of their split-contigs 
on ESOM maps" that were based on the tetranucleotide frequency profiles of each 
>5kb split-contig of the entire assembled metagenome. The CONCOCT clustering 
showed a high degree of consistency with the ESOM clustering. Occasionally the 
split-contigs of two different CONCOCT bins would be located in the same area 
on the ESOM map. For such cases, the two CONCOCT bins would be merged 
into a single bin. Next, all bins of interest were assessed and cleaned using differ- 
ential coverage, GC composition, linkage and presence of well-conserved bacte- 
rial genes” information with mmgenome™. Linkage information, that is, which 
contigs are connected through read pairs, was obtained from the read mapping 
with the CONCOCT script bam_to_linkage.py (--fullsearch, --regionlength 500). 
Next, split-contigs were replaced by their corresponding full-length contigs. In 
case not all split-contigs from a corresponding full-length contig were present in 
a cleaned bin, the full-length contig was only included in the bin if the majority 
of its split-contigs was present. Finally, the >2 kb full-length contigs of each bin 
were projected onto the ESOM maps. If a contig was relatively isolated from the 
majority of contigs on the map, that is, it exhibited a large difference in terms of 
tetranucleotide frequencies, it was removed from the final bin. 


Metagenome assembly and binning of Pacific Ocean samples. The Tara Oceans 
project has generated a wealth of metagenomic sequence data across 243 samples 
enriched for prokaryotes. Ideally, one would use all this information to acquire 
genomic data from novel alphaproteobacteria. This is however too computation- 
ally demanding, and thus a sub-selection of samples had to be made. To identify 
samples of interest, we applied the following strategy: first, we applied the RP15 
pipeline (backbone: bacteria, phylogenetic inference: FastTree) on all publicly avail- 
able metagenome assemblies of these samples (Supplementary Table 5). Then, for 
samples that the RP15 tree suggested to contain alphaproteobacterial lineages, 
ribocontigs phylogenetically classified as alphaproteobacteria were incorporated 
in another RP15 dataset (backbone: alphaproteobacteria, phylogenetic inference: 
RAxML) to obtain an improved phylogenetic resolution within the alphaproteo- 
bacteria. This screen showed that the large majority of the 243 samples contained 
novel alphaproteobacterial lineages. Finally, 45 samples (Supplementary Table 6) 
were chosen based on the presence of unique novel lineages and larger sequencing 
depth. This increases the chance to properly assemble rare, but phylogenetically 
interesting, novel alphaproteobacteria. 

Each sequencing run of the selected samples (Supplementary Table 6) was pro- 
cessed with either Trimmomatic v.0.35 only or with SeqPrep (https://github.com/ 
jstjohn/SeqPrep) before Trimmomatic. SeqPrep merged overlapping read pairs 
into longer single reads (maximum Phred score per base-call in merged read: 41) 
and trimmed read-through Illumina adapters (-A GATCGGAAGAGCACAGG, -B 
AGATCGGAAGAGCGTCGT). Trimmomatic trimmed (residual) read-through 
Illumina adapters (ILLUMINACLIP: TruSeq3-PE-2.fa:2:30:10:1:true), trimmed 
low quality base-calls at the starts and ends of reads (LEADING: 3, TRAILING: 3), 
removed reads shorter than 60 bp (MINLEN: 60) and finally removed reads that 
have an average Phred score lower than 30 (AVGQUAL:30) (27, 28 or 29 for some 
samples; see Supplementary Table 7). Single-end mode was used for merged reads 
and paired-end mode for read pairs. The overall quality and presence of adaptor 
sequences in the unprocessed reads and different versions of processed reads were 
evaluated with FastQC v.0.11.4. 

Although all samples have already been assembled”! using SOAPdenovo, 
we chose to re-assemble the metagenomes of ‘122 deep chlorophyl maximum 
(DCM) layer 0.22-0.45-1m, “125 surface (SRF) layer 0.22-0.45-junv and ‘125 
marine epipelagic mixed (MIX) layer 0.22-0.3-j1m’ samples (Supplementary 
Table 6) with metaSPAdes, implemented in SPAdes v.3.7.07°. metaSPAdes is 
designed specifically to deal with the metagenomic data. For the assembly the 
k-mers 21, 33, 55 and 77 were used. For 122 DCM 0.22-0.45, reads processed with 
SeqPrep and Trimmomatic were used as input. SeqPrep-merged reads were pooled 
with the Trimmomatic unpaired reads to serve as ‘unpaired reads. For 125 SRF 
0.22-0.45 and 125 MIX 0.22-0.3, reads processed with Trimmomatic only were 
used as input. Analysis with QUAST* showed that, when considering only con- 
tigs of >500 bp, metaSPAdes assemblies had typically an approximately 2 x larger 
total assembly size, and notably larger contigs compared with the SOAPdenovo 
assemblies (data not shown). 

Per assembly, all contigs were split by cutting every 10kb, unless the remain- 
ing fragment was smaller than 20 kb, with the CONCOCT script cut_up_fasta. 
py. Then kallisto v.0.42.5*° was used to map the SeqPrep and Trimmomatic pre- 
processed reads from each of the 45 selected samples’ sequencing runs against 
the split-contigs of each of the three assemblies. kallisto was run in single-end 
mode, treating all reads as unpaired reads. The kallisto output was transformed 
into a matrix containing the differential coverage profiles for each split-contig, 
using the input_table.py script that was provided by J. Alneberg (see ‘Code avail- 
ability’). CONCOCT v.0.4.0 then clustered the split-contigs into bins using their 
tetranucleotide frequency profiles and differential coverage profiles. CONCOCT 
was run separately for minimum split-contig length cutoffs 2 kb and 3 kb. Bins that 
corresponded to potential novel alphaproteobacteria were identified by check- 
ing which bins contained alphaproteobacterial ribocontigs. The bins were then 
assessed and cleaned using information of differential coverage, GC composition, 
linkage and presence of 139 genes well-conserved across Bacteria with mmgenome. 
Linkage information, that is, which contigs were connected through read-pairs, was 
obtained by mapping reads pairs processed with SeqPrep and Trimmomatic with 
the map-bowtie2-markduplicates.sh script (Bowtie2 v.2.1.0) onto the split-contigs. 
By default, only the bins generated with the 2-kb length cutoff were chosen, unless 
such a bin was deemed unsatisfactory. In that case, the corresponding bin that was 
generated with the 3-kb length cutoff was chosen. If multiple full-length contigs 
had their corresponding split-contigs distributed across two different bins, the bins 
were merged within mmgenome before bin cleaning. Finally, split-contigs were 
replaced by their full-length contigs. In case not all split-contigs from a correspond- 
ing full-length contig were present in a cleaned bin, the full-length contig was only 
included in the final bin if the majority of split-contigs was present. 
Phylogenetic diversity in metagenome assemblies. The RP15 pipeline was used 
to evaluate the phylogenetic diversity present within all seven assembled sam- 
ples. The pipeline was executed on all seven complete metagenome assemblies 
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simultaneously (backbone: bacteria, phylogenetic inference: RAxML), yielding 
the phylogenetic tree shown in Supplementary Fig. 1. Ribocontigs phylogeneti- 
cally classified as alphaproteobacteria were incorporated in another RP15 dataset 
(backbone: alphaproteobacteria, phylogenetic inference: RAxML), yielding the 
tree shown in Fig. 1b and Supplementary Fig. 2. 

Completeness and redundancy estimation. The completeness and redundancy 
of each final bin was estimated by checking for the presence of 139 well-conserved 
marker genes across bacteria. A detailed description of the method was previously 
published*”. In brief, it aims to provide more weight to marker genes that typically 
have a large distance towards other marker genes, and a smaller weight to marker 
genes that typically have a shorter distance towards other marker genes. 
Annotation. All bins were annotated with prokka v.1.12*8, which was altered to 
allow for partial gene predictions on contig edges (GitHub pull request #219), with 
the options --compliant, --partialgenes, --cdsrnaolap and--evalue 1e-10, and with 
barrnap as the rRNA predictor. GC content and median intergenic space distances 
(only considering the coding sequence (CDS), rRNA and tRNA features as genes) 
were calculated with in-house scripts. 

Phylogenomics dataset for alphaproteobacteria. To build a phylogenomics 
dataset consisting of highly conserved, vertically evolving genes that undergo a 
minimum of horizontal gene transfers across a phylogenetically diverse set of alp- 
haproteobacteria, we used the following strategy: first, we selected 165 aproNOGs 
(available from the eggNOG database v.4.5"°) that had exactly one copy in >95% 
of the taxa present in the aproNOGs. During the selection, the taxa Ehrlichia rumi- 
nantium Welgevonden, Gluconacetobacter diazotrophicus PA]5, Oligotropha carbox- 
idovorans OM5, Bartonella bacilliformis KC583, ‘Candidatus Hodgkinia cicadicola 
Dsem, “Candidatus Liberibacter asiaticus psy62’ and ‘Candidatus Liberibacter 
solanacearum CLso-ZCl’ were not considered, because they were exceptions that 
typically contained two copies or more of these genes. Because the set of taxa in 
the aproNOGs is limited, and lacks many phylogenetically informative taxa, and 
by definition Beta- and Gammaproteobacteria that are needed for the outgroup, 
we added orthologues from a phylogenetically diverse set of publically available 
alpha-, beta- and gammaproteobacteria (Supplementary Table 8), and the bins 
obtained in this study. Orthologue detection was done by using the trimmed align- 
ments (available from eggNOG) of these 165 aproNOGs as queries in a PSI-BLAST 
search against the protein complements of to-be-added genomes. After PSI-BLAST 
hits were added to the original aproNOGs, single gene trees were inferred with 
RAXML (LG+T) to identify and remove all non-orthologues (paralogues, con- 
taminations present in poorly curated genomes). Resulting clusters of orthologous 
genes (COGs) that did not contain >6 outgroup taxa that were monophyletic, did 
not have all major alphaproteobacterial taxonomic groups represented by at least 
one taxon, and did not contain >85% of all taxa were removed. A RAxML tree was 
inferred from the concatenation of the remaining 90 COGs to make a final taxon 
selection. We aimed to reduce the number of taxa to a computationally tractable 
number while keeping the phylogenetic diversity as high as possible. While mak- 
ing single gene trees, we observed that several bins, despite coming from differ- 
ent samples, were virtually phylogenetically identical (tip-to-tip distance: <0.01 
substitutions per site). We therefore decided to collapse these near-identical bins 
into ‘composite bins. A composite bin consists mostly of the phylogenetic marker 
genes from the bin that is most complete among the near-identical bins, but is 
complemented with marker genes belonging to less complete bin(s). This action 
simultaneously reduced the number of taxa and the percentage of missing data 
while keeping the phylogenetic diversity the same. We then applied a discordance 
filter*” to detect COGs potentially affected by within-Alphaproteobacteria horizon- 
tal gene transfer. We removed the 20% most discordant COGs. These 72 COGs are 
from here on referred to as ‘alphaCOGs’ (Supplementary Table 2). 
Phylogenomics datasets for alphaproteobacteria and mitochondria. To build a 
phylogenomics dataset consisting of highly conserved, vertically inherited genes 
that undergo a minimum of horizontal gene transfers across a diverse set of alp- 
haproteobacteria and gene-rich mitochondrial genomes, we used the following 
strategy: first, we detected orthologues of all alphaCOGs in the mitochondrial 
genome of A. godoyi with a PSI-BLAST search. We chose this genome because, 
with 66 genes, it is currently the most gene-rich mitochondrial genome available 
and is present in the publicly available and curated mitoCOGs"!. We then used 
the alphaCOG-Andalucia orthologue connection to merge 14 alphaCOGs with 
their corresponding mitoCOGs. Before merging, the mitoCOGs were filtered for 
the taxa A. godoyi, Seculamonas ecuadoriensis, Histiona aroides, Reclinomonas 
americana, Jakoba libera, J. bahamiensis, Physcomitrella patens, Ostreococcus 
tauri, P. infestans and M. jakobiformis. These mitochondrial genomes were cho- 
sen, because they were gene rich and exhibited short branch lengths in previous 
phylogenetic analyses (figure 1 and supplementary figures 5 and 6 of ref. °°). By 
choosing genomes associated with shorter branch lengths, we aimed to minimize 
long branch artefacts. We then used the remaining 52 Andalucia mitochondrial 
genes to build additional orthologous groups. These genes can still be phyloge- 
netically informative despite not being picked up by the stringent gene selection 
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used to build the alphaCOGs. Orthologues from all considered alphaproteobac- 
teria and bins were detected through a BLAST search. For each of the resulting 34 
orthologous groups with >70 taxa, a single gene tree was made with FastTree (LG 
model) to identify and remove non-orthologues (paralogues, contaminations for 
poorly annotated genomes). The cleaned orthologous groups were then merged 
with taxon-selected mitoCOGs (see above) using the mitoCOG membership 
information of each Andalucia gene. Phylogenetically near-identical bins were 
collapsed into composite bins as described in ‘Phylogenomics dataset for alphapro- 
teobacteria. Next, RAxML single gene trees were inferred for all 48 orthologous 
groups and visually inspected. We finally selected 24 orthologous groups for which 
the outgroup exhibited monophyly, mitochondria exhibited monophyly, the alp- 
haproteobacteria topology was congruent with the expected species tree (that is, 
were minimally affected by horizontal gene transfer) and for which the longest 
mitochondrial branch was <1.5x longer than the longest alphaproteobacterial 
branch (root-to-tip lengths). These 24 orthologous groups are hereafter referred 
to as ‘alphamitoCOGs’ (Supplementary Table 2). Except for rp/2 from Ostreococcus, 
all eukaryotic genes were mitochondrial encoded. 

To evaluate the robustness of the phylogenetic placement of mitochondria 

recovered from the dataset described above, we built four additional phylog- 
enomics datasets. (i) alphamitoCOGs with a more diverse mitochondrial taxon 
sampling. Taxa were selected as follows: for all taxa in the mitoCOGs” that encode 
at least 10 out of 24 alphamitoCOGs on the mitochondrial genome, a concatenated 
supermatrix was constructed as described above. A phylogeny was then inferred 
with IQTREE (LG + F + R7, model selected by IQTREEs ModelFinder*”). Taxa 
were then selected based on phylogenetic diversity, number of genes and short 
branch lengths (Supplementary Fig. 21). (ii) alphamitoCOGs for which the out- 
groups MarineProteol, Magnetococcales, Beta- and Gammaproteobacteria were 
removed. (iii) alphamitoCOGs plus 14 mitochondrial-encoded genes for which 
mitochondrial taxa were placed on moderately longer branch lengths compared to 
alphaproteobacteria (<2.5 x longer than the longest alphaproteobacterial branch 
(root-to-tip lengths)). And (iv) a set of 29 nuclear encoded genes that were previ- 
ously identified*. To build the latter dataset, we detected their orthologues in all 
reference taxa (Supplementary Table 8) and bins through PSIBLAST searches, 
using trimmed alignments of the publically available 29 orthologous groups* 
(Supplementary Information, dataset 7) as query. Orthologues from Arabidopsis 
thaliana, Cryptococcus neoformans, M. brevicollis, Nematostella vectensis, P. infes- 
tans and Spizellomyces punctatus were directly added from this dataset. The result- 
ing COGs were checked and cleaned for non-orthologues, and near-identical bins 
collapsed into composite bins as described above. 
Phylogenetic inference. From the alphaCOGs and alphamitoCOGs phylogenom- 
ics datasets, three supermatrix alignments were prepared: (i) the concatenation 
of all aligned orthologous groups (alignment: MAFFT L-INS-i, alignment trim- 
mer: BMGE -m BLOSUM30), (ii) alignment (i) recoded into the SR4 categories 
(AGNPST, CHWY, DEKQR, FILMV)” and (iii) alignment (i) trimmed with 
BMGES’ stationary-based trimmer (BMGE -s FAST -h 0:1 -g 1). This trimmer 
removes sites from an alignment until all pair-wise taxa are no longer significantly 
compositionally heterogeneous according to Stuart’s test of marginal homogeneity”. 
Supermatrices (ii) and (iii) represent two different strategies to reduce composi- 
tional heterogeneity across taxa, a strong source of artefactual phylogenetic signal 
within the Alphaproteobacteria and mitochondria. An overview of all alignments 
(length, missing data, \” score and informative sites) can be found in Supplementary 
Table 3. All alignments were then used for phylogenetic reconstruction under the 
CAT +GTR-+I4 model as implemented in PhyloBayes MPI v.1.7a°4. Four inde- 
pendent Markov chain Monte Carlo (MCMC) chains were run until convergence 
(maxdiff< 0.3) or a sufficient effective sample size was reached (effsize > 300). In 
cases in which such an effective size was not reached due to computational limita- 
tions, the MCMC distributions of the log-likelihood values, the total tree-lengths, 
the a parameter of the gamma distribution and the number of categories were 
visually inspected to assess whether a chain reached a sufficient sample size. When 
making consensus trees, the first 5,000 cycles (for amino acid encoded alignments) 
or the first 2,000 cycles (for SR4 encoded alignments) were discarded as burn-in. 
In addition, all non-recoded alignments were used for maximum-likelihood phy- 
logenetic reconstruction under the LG + C60 + F+1'4 model (1,000 ultra-fast 
bootstraps) and its PMSF approximation (100 non-parametric bootstraps; guide 
tree: LG-+ F + T4) as implemented by IQTREE v.1.5.0a°°. The PMSF model* is a 
computational efficient approximation of the empirical profile mixture models C10 
to C60°” that allows the user to use non-parametric bootstraps. 

From each of the four additional phylogenomics datasets, we prepared a con- 
catenated supermatrix and a stationary-trimmed concatenated supermatrix in 
the same manner as was done for the alphaCOG and alphamitoCOG datasets. 
These were then used for maximum-likelihood phylogenetic reconstruction as 
described above. 

Posterior predictive checks. Posterior predictive checks as implemented in 
PhyloBayes MPI v.1.7a were performed to control whether the inferred phylogenetic 
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models adequately captured the across-taxa compositional heterogeneity and 
site-specific pattern diversity present in the alignments. In brief, it simulates 
alignments based on model parameter configurations sampled (here: every 50 
cycles after the burn-in) from the posterior distribution (a given MCMC chain). 
It then calculates a set of test statistics of interest (here the maximum deviation 
and mean square deviation between global and taxon-specific empirical frequen- 
cies that reflects compositional heterogeneity across taxa, and the mean number 
of distinct characters per site that reflect site-specific pattern diversity) for each 
simulated alignment, yielding a null distribution per test statistic. Finally, the null 
distribution is compared to the observed value of the test statistic of the original 
alignment and a P value and a z score is calculated (one-sided test). Thus, if the 
observed value of the test statistic falls within the 95% confidence interval of the 
null distribution, the conclusion is that the property that the test statistic reflects 
was adequately modelled. 

Progressive reduction of compositional heterogeneity. To investigate how com- 
positional heterogeneity affects the tree topology in more detail, we executed a 
scheme in which we reduced the heterogeneity of a given alignment in a stepwise 
fashion and subsequently tracked the bootstrap supports of certain branches of 
interest. To do this, the unchanged supermatrices (alignments (i)) of both phylog- 
enomics datasets were subjected to an alignment trimmer ( ? trimmer; see ‘Code 
availability’) that removes sites that contribute most to the overall heterogeneity 
(the total x? score)””. Here, we prepared a set of alignments with progressively 
more heterogeneous sites (0, 5, 10, 15, 20, 25, 30, 35 and 40%) removed, and thus 
with decreasing overall heterogeneity. For each resulting alignment, we inferred 
100 non-parametric bootstraps under the PMSF model (guide tree: LG-+ F +14, 
mixture model: LG + C60 + F+ ['4), implemented in IQTREE v.1.5.0a. Finally, we 
extracted the bootstrap supports for each bipartition of interest (see main text) and 
tracked them along the decreasing heterogeneity gradient. The alignment with the 
20% most heterogeneous sites removed was furthermore used for a phylogenetic 
reconstruction with PhyloBayes-MPI under the CAT + GTR+I4 model. 
Topology tests. For each maximum-likelihood phylogenetic analysis that included 
mitochondria, we used the approximately unbiased test** as implemented by 
CONSEL v.1.20 to evaluate ten hypotheses for the phylogenetic origin of the 
mitochondria: sister to Rickettsiales, Alphaproteobacteria, Pelagibacterales, all 
non-Rickettsiales alphaproteobacteria, Tistrella mobilis and Geminicoccus roseus, 
Tistrella mobilis, Geminicoccus roseus, MarineAlpha9 and 10, MarineAlpha9 or 
MarineAlphal0. We first obtained maximum-likelihood trees with IQTREE 
under the constraint defined by each hypothesis (-g). We then added bootstrap 
trees with branch lengths (a 100 out of the 1,000 ultra-fast bootstraps in case of 
LG+C60+F +14, 100 non-parametric bootstraps in case of the PMSF approx- 
imation) from the unconstrained maximum-likelihood search to improve the 
accuracy of the approximately unbiased test*’. Site-likelihoods for all 110 trees 
were then calculated under the LG + C60 + F +14 or its PMSF approximation 
with IQTREE (-wsl). Finally, the approximately unbiased test was performed with 
CONSEL v.0.20 using the IQTREE generated site-likelihoods. 

Parametric simulations. A maximum likelihood tree was inferred with IQTREE 
from the stationary-trimmed alignment of the alphamito24 dataset under the 
Rickettsiales-sister constraint (-g) and the LG + C60 + F +14 model. The result- 
ing maximum likelihood estimates of the model parameters—that is, mixture 
weights (+C60), observed amino acid frequencies (+F) and alpha shape param- 
eter (+1'4)—and tree were then used to simulate ten independent supermatrix 
alignments of the same length as the original stationary-trimmed alignment with 
SiteSpecific.seqgen”*, an adjusted version of SeqGen™ able to simulate using mix- 
ture models. 

Random sequence analysis. Ten independent datasets were generated by replacing 
all mitochondrial sequences in the alphamito24 dataset with random sequences 
of the same length using the EMBOSS tool makeprotseq. For each dataset, a 
stationary-trimmed supermatrix alignment was prepared in the same manner as 
for the other datasets in this study. 

Reporting Summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Code availability. input_table.py is freely available on GitHub (http://github.com/ 
EnvGen/toolbox/tree/master/scripts/kallisto_concoct/input_table.py). alignment_ 
pruner.pl is freely available on GitHub (https://github.com/novigit/davinciCode/ 
tree/master/perl/alignment_pruner.pl). 

Data availability. The genome bins described in this study have been deposited at 
DDBJ/EMBL/GenBank under the BioProject ID PRJNA390581 and whole-genome 


sequencing (WGS) accessions PTJW00000000-PTLO00000000, with accord- 
ing versions PT]W01000000-PTLO01000000. All metagenome assemblies and 
supermatrix alignments generated in this study are archived at the Dryad Digital 
Repository: https://datadryad.org//resource/doi:10.5061/dryad.068d0d0. 
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The blastocyst (the early mammalian embryo) forms all embryonic 
and extra-embryonic tissues, including the placenta. It consists 
of a spherical thin-walled layer, known as the trophectoderm, 
that surrounds a fluid-filled cavity sheltering the embryonic 
cells’. From mouse blastocysts, it is possible to derive both 
trophoblast? and embryonic stem-cell lines*, which are in vitro 
analogues of the trophectoderm and embryonic compartments, 
respectively. Here we report that trophoblast and embryonic stem 
cells cooperate in vitro to form structures that morphologically 
and transcriptionally resemble embryonic day 3.5 blastocysts, 
termed blastoids. Like blastocysts, blastoids form from inductive 
signals that originate from the inner embryonic cells and drive 
the development of the outer trophectoderm. The nature and 
function of these signals have been largely unexplored. Genetically 
and physically uncoupling the embryonic and trophectoderm 
compartments, along with single-cell transcriptomics, reveals 
the extensive inventory of embryonic inductions. We specifically 
show that the embryonic cells maintain trophoblast proliferation 
and self-renewal, while fine-tuning trophoblast epithelial 
morphogenesis in part via a BMP4/Nodal-KLF6 axis. Although 
blastoids do not support the development of bona fide embryos, 
we demonstrate that embryonic inductions are crucial to form 
a trophectoderm state that robustly implants and triggers 
decidualization in utero. Thus, at this stage, the nascent embryo 
fuels trophectoderm development and implantation. 

Although stem cells mimic development*®, a stem cell-based model 
of the blastocyst is lacking. To address this, we confined cultured 
embryonic stem (ES) cells to an average of five cells per microwell”® 
(Extended Data Fig. 1a). These formed non-adherent aggregates within 
24h, over which we laid trophoblast stem (TS) cells (Fig. 1a). Upon 
aggregation, the ES cells induced the formation of trophoblast cysts 
(65h). At low frequency (less than 0.3%), cells spontaneously organized 
into regular TS cell cysts with internal ES cell aggregates, which we 
termed blastoids (Extended Data Fig. 1b). 

We optimized the organization by increasing the engulfment of ES 
and TS cells (Extended Data Fig. 1c, d), and analysed the functions of 
the WNT and cAMP signalling pathways, which correlate with the 
formation of the blastocoel fluid-filled cavity®’®. Indeed, blastocysts 
exposed to inhibitors of tankyrase or protein kinase A (PKA), which 
are used as antagonists of WNT and cAMP signalling, respectively, 
developed a smaller blastocoel (Extended Data Fig. le). Consistently, 
the trophectoderm cells’® and TS cells produce WNT family members 
6 (Wnt6) and 7b (Wnt7b), and autocrine WNT activity (Extended Data 
Fig. 1f, g). CAMP and WNT pathway stimulation increased TS cell cavi- 
tation and blastoid formation using several ES and TS cell lines (Fig. 1b, 
Extended Data Fig. 1h, i). Blastoids formed efficiently at specific ratios 
(70% when 8 ES cells and 20 TS cells are combined; Fig. 1c), expanded 
and stabilized within 65h after seeding to diameters similar to E3.5 
blastocysts (90 jum; Fig. 1d-f and Extended Data Fig. 1)). 


2,4,5 


We next analysed blastocyst transcription factors. Although the 
expression of pluripotency markers OCT4 (also known as POUS5F1) 
and NANOG! were maintained in the pluripotent compartment 
(Fig. 2a, b), mRNA expression of the trophoblast marker Cdx2!2 was 
low in both TS cells and blastoids compared to blastocysts (Extended 
Data Fig. 2a, b). The addition of the TS cell regulators FGF4 and 
TGF@1)9 increased CDX2 expression in blastoids, but levels remained 
low (Extended Data Fig. 2b), prompting us to seek new regulators. On 
the basis of observations of the STAT pathway members in blastocysts 
(Extended Data Fig. 2c-g), we found that IL-11 (the most abun- 
dantly expressed ligand) or 8Br-cAMP increased CDX2 protein 
levels!*}3, complemented FGF4 and TGF#1, maintained trophecto- 
derm markers, and enhanced CDX2 expression in blastoids (Extended 
Data Figs. 2b, 3a—e and Fig. 2c). Blastoids remained permissive to the 
formation of primitive endoderm-like cells, albeit at lower numbers 
than blastocysts! (Fig. 2d, Extended Data Fig. 4a-c). Cell number in 
the blastoid compartments was reminiscent of a mid-stage blastocyst 
(Extended Data Fig. 4a-c). 

After injection into a foster blastocyst, ES and TS cells contribute to 
embryo and placental development, respectively. We analysed whether 
ES or TS cell lines derived de novo from blastoids had similar com- 
petences. The efficiency of TS cells derivation was similar between 
blastoids and blastocysts (90% generate CDX2* colonies at passage 2, 
Extended Data Fig. 5a), whereas CDX2-low blastoids (see Methods) 
displayed lower derivation efficiency (35%). After injection into blasto- 
cysts, these de novo TS and ES cells contributed to the formation of the 
extra-embryonic tissues (embryonic day (E) 6.5 and E11.5 embryos), 
and epiblast, respectively (Fig. 2e, Extended Data Fig. 5a). Thus, the 
blastoid environment maintains the developmental potential of both 
lineages. 

Blastoids transferred into the uterus of pseudo-pregnant mice 
induced deciduae formation with typical local vascular permeability 
(Fig. 2f), suggesting anastomosis of trophoblasts with the mother’s 
vascular system’». Indeed, blastoid trophoblasts generated cells incor- 
porated into the ingrowing maternal vasculature and expressing 
proliferin, a marker for trophoblast giant cells®'* (Fig. 2g). Blastoids 
did not support full bona fide embryonic development, but generated 
numerous cells positive for a variety of extra-embryonic markers of 
different post-implantation trophoblast cell types®!>!” (Extended 
Data Fig. 5b). Decidualization is a complex process regulated by the 
hormonal cycle and endometrium and stimulated by the embryo in 
rodents!®. It can be partly reproduced by intraluminal deposition of 
sesame oil (deciduoma). We thus analysed the specificity of the decid- 
ualization process. Injection of vehicle medium alone did not induce 
decidualization (Fig. 2h and Extended Data Fig. 5c). Blastocysts and 
blastoids formed small, discrete deciduae, whereas oil triggered the 
formation of larger, more continuous deciduomata (Extended Data 
Fig. 5d). How the conceptus regulates decidualization is not well 
understood; however, Aldh3a1 is one of the rare genes induced in the 
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Fig. 1 | Embryonic and trophoblast stem cells form blastocyst-like 
structures in vitro. a, Schematic of blastoid formation. ES and TS cells 
are derived from blastocysts (left), then sequentially combined using 

a microwell array (right)’. ES cells are seeded to form non-adherent 
aggregates within 24h (right, red arrow denotes a 24-h aggregate). TS cells 
are then added (right, green arrow denotes TS cells upon seeding). Scale 
bar, 200 um. b, Aggregates of ES cells engulfed by TS cells were exposed 
to a WNT activator (3 14M CHIR99021; CHIR) and a cAMP analogue (0.2 
or 1mM 8Br-cAMP; cAMP). Yields of blastoids are shown (measured as 
the percentage of microwells containing a TS cell cyst enclosing an ES 
cell, 65h after TS cell addition). 8Br-cAMP plus CHIR99021 generated 
significantly higher yields of 9% blastoids (P = 0.006, two-sided Student's 
t-test). n =3 independent microwell arrays. Error bars are s.d. c, Blastoid 


decidua by the implanting conceptus!®. Consistently, ALDH3A1 was 
locally expressed in a group of decidual cells in the mesometrial side 
of blastocyst- and blastoid-induced deciduae (Fig. 2i and Extended 
Data Fig. 5e). Thus, the blastoid recapitulates key aspects of uterine 
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yield as a function of the initial number of TS and ES cells in individual 
microwells. 70% of microwells seeded with an optimal number of ES and 
TS cells contain a blastoid. d, Left, evolution of blastoid morphology from 
24 to 65h. Right, blastoids formed with ES cells positive for red fluorescent 
protein (RFP)-tagged histone H2B (H2B-RFP* ES cells; red) and TS cells 
positive for green fluorescent protein (GFP) (GFP* TS cells; green). Scale 
bars, 50\um. e, Distribution of the overall diameters of blastoids, at 24 

and 65h, and of blastocysts collected on day E3.5. Bar plots indicate the 
percentage frequency of specific diameters. n = 50 independent blastoids 
or blastocysts. Vertical line denotes the median. f, Light microscopy image 
showing blastoids and E3.5 blastocysts (high magnification images shown 
in Extended Data Fig. 1). Scale bar, 200m. All experiments were repeated 
at least three times with similar results. 


implantation (discrete decidualization, vascular anastomosis, and pat- 
terned ALDH3A1 expression in the decidua). 

Comparing transcriptomes using unsupervised clustering analysis 
(Fig. 3a) confirmed that blastoids resembled blastocysts at E3.5, when 


Fig. 2 | Blastoids implant in utero and trigger the formation of 
patterned deciduae. a, b, Immunofluorescent staining for NANOG 
(green, a) and OCT4 (green, b) in blastoids counterstained with F-actin 
(red). c, Immunofluorescent staining for CDX2 within blastoids (Extended 
Data Fig. 2b). d, Immunofluorescent staining for NANOG (white) 

in blastoids formed with ES cells comprising a primitive endoderm- 
specific PDGFRa-H2B-GFP reporter (green) and counterstained with 
F-actin (red). Scale bars, 50 jum (a-d). e, E6.5 and E11.5 embryos from 
blastocysts injected with de novo blastoid-derived ES cells (left) and TS 
cells (middle and right). Insets show the contribution of H2B-RFP* ES 
cells to the epiblast (left) and of GFP* TS cells to the extra-embryonic 
tissue (middle). Also see Extended Data Fig. 5a. f, Uterus transferred with 
blastoids at E3.3-E3.5 and explanted at E7.5. Mice are injected systemically 
with Evan blue dye, revealing typical local vascular permeability of the 
implantation sites (deciduae: white arrowheads; ovary: red arrowhead). 

g, Left, anti-GFP staining of a decidua containing an in utero developed 
blastoid formed with GFP* TS cells (E6.5). Scale bar, 1 mm. Right, anti- 
GFP staining (top; scale bar, 100 ,1m) and proliferin (bottom; scale bar, 
50\1m) within histological sections of deciduae including vascular lumens 
(red arrowheads). h, Uterus transferred with blastoids (left horn) and 
vehicle medium (right) at E3.3-E3.5, and explanted at E6.5 (deciduae: 
white arrowheads; ovary: red arrowhead). i, Anti-ALDH3A1 staining of a 
representative decidua induced by a blastoid. B, blastoid implantation site; 
M, mesometrial side. Red arrowhead denotes the decidua sub-population 
expressing ALDH3A1. All experiments were repeated at least three times 
with similar results. 
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Fig. 3 | Communication between blastoid embryonic and 

trophoblast compartments shifts the transcriptome towards an E3.5 
blastocyst state. a, Bulk sequencing. Whole-transcriptome distance map. 
The total number of either blastocysts or blastoids is 50. b-d, Single-cell 
RNA sequencing. b, t-distributed stochastic neighbour embedding 
(t-SNE) map representation of transcriptome similarities between 1,577 
cells collected from 2D-cultured parental ES cells in serum-free (2i) 
medium (316 cells) and parental TS cells in serum-free (TX) medium 
(290 cells), blastoid cells (367 cells), TS and ES cells cultured separately in 
blastoid medium within microwells (that is, 336 trophosphere cells, and 
268 embryoid body cells). Arrows indicate transcriptome shifts due either 
to the culture environment (for example, culture medium and microwell 
confinement) or to the blastoid environment (for example, ES and TS cell 
communication and confinement within the blastoid cyst). Clustering is 


lineage transcriptomes are stabilized’, consistent with the committed 
state of TS and ES cells. Similar clustering occurred when limited to 
transcription factors, and dissimilarities were primarily related to metab- 
olism (Extended Data Fig. 6a, b and Supplementary Table 1 sheet 1). 
Importantly, the transcriptome of blastoids distinctly separated from 
their parental ES and TS cell lines (Fig. 3a). We then combined in silico 
the transcriptome of TS and ES cells cultured separately in blastoid 
medium: this virtual blastoid clustered with the parental cell lines, 
not with the blastoids (Fig. 3a), suggesting that the communication 
between the embryonic and trophoblast cells contributed to the E3.5- 
like transcriptome shift. Comparison with the blastoid transcriptome 
identified differences within cell cycle, epithelial junctions and MAPK 
signalling pathways (Extended Data Fig. 6b, Supplementary Table 1 
sheet 2), which all drive blastocyst formation”?°-”2, A role for the TGE8 
signalling pathway also surfaced, the contribution of which to blasto- 
cyst formation is less clear’>*. 

We further evaluated the relative contribution of the culture con- 
dition (for example, culture medium, microwell confinement) and 
blastoid environment (for example, ES and TS cell communication, 
blastoid cyst confinement) to the transcriptome shift via single-cell 
sequencing of 2D-cultured parental TS and ES cells, blastoid cells, 
and TS and ES cells cultured individually in blastoid culture environ- 
ment. In such conditions, TS cells alone formed trophospheres and 
ES cells formed embryoid bodies. Non-supervised clustering analy- 
sis clearly assessed the separation of embryonic cells (ES cells, blas- 
toid embryonic cells and embryoid body cells) and trophoblast cells 


108 | NATURE | VOL 557 | 3 MAY 2018 


b Origin of the single cells 


¢ ES cells in 2i medium (2D) 
e Embryoid bodies (3D) 
e Embryonic cells from blastoids 


e TS cells in TX medium (2D) 


Trophosphere cells 
e Trophoblasts from blastoids 


Effect of the environments 


Transcription factor-wide clustering 


Blastoid 


environment Z, \ymmament 
os, ry 
é 
re Fo é 
i 


Culture environment 


Culture 


Culture Blastoid 


) 


Culture Blastoid Blastoid environment 


0.97 

08! NES : 2.08 
0.7 ES: 0.86 
0.6 | FDR q = 0.005 
0.5; P=0.005 


Bi 


Gene set: genes enriched in blastocyst trophoblasts vs TS cells 


based on the transcription factors expressed in the blastocyst. 

See whole-transcriptome clustering in Extended Data Fig. 6c, d. 

The embryonic or trophoblast blastoid cells are identified based on 
FACS-sorting indexes. c, t-SNE map representation of key transcription 
factors for the ICM, trophectoderm and placenta. d, Comparison of 
trophoblasts from blastoids and blastocysts. Gene set enrichment analysis. 
Genes are ranked according to their difference in expression between 
trophoblasts of blastoids (left of the horizontal axis) and parental TS cells 
(right of the horizontal axis). Black bars depict the position of a gene set of 
281 genes significantly enriched in trophoblasts of blastocyst compared to 
parental TS cells (P=0.05, see Supplementary Table 1 sheet 5). This gene 
set is significantly enriched in the trophoblasts of blastoids (normalized 
enrichment score (NES) = 2.08, P= 0.005). ES, enrichment score; FDR, 
false discovery rate. 


(TS cells, blastoid trophoblast cells and trophosphere cells) (Fig. 3b 
and Extended Data Fig. 6c—e). Within the embryonic or trophoblast 
cluster, cells changed mainly owing to the culture environment (for 
example, culture medium and microwell confinement) and to the blas- 
toid environment (for example, communication between embryonic 
and trophoblast cells, and cyst confinement). All embryonic cells main- 
tained core pluripotency markers and did not express epiblast stem-cell 
markers (Fig. 3c and Extended Data Fig. 6g). Consistent with the de 
novo derivation of stem-cell lines, the core transcription factors of the 
TS cells were also maintained in the blastoid environment (Fig. 3c). In 
sharp contrast, the absence of embryonic cells strongly decreased these 
transcription factors in the cells of trophospheres, which expressed 
differentiation genes typical of post-implantation placental cell types 
(Fig. 3c and Extended Data Fig. 6f, g). 

Consistent with the morphogenesis of a cyst, blastoid tropho- 
blasts enriched in epithelial transcripts (Extended Data Fig. 6f, g and 
Supplementary Table 1 sheets 4-6), the proteins of which were correctly 
localized (Extended Data Fig. 7a—d). Finally, gene set enrichment anal- 
ysis revealed that blastoid trophoblasts were largely enriched in the 
transcripts of blastocyst trophoblasts (Fig. 3d, Supplementary Table 1 
sheet 5). Thus, the blastoid environment prevents the differentiation 
of trophoblasts, while fuelling the epithelial morphogenesis of a 
trophectoderm-like cyst. 

To analyse functionally the embryonic inductions regulating tro- 
phectoderm development, we tested the specificity of the compart- 
ments’ interactions by substituting the ES cells for other cell types, 
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Fig. 4 | Embryonic inductions regulate trophectoderm proliferation, 
self-renewal, epithelial morphogenesis and implantation, partially via 
BMP4 and Nodal. a, Blastoids formed with GFP* TS cells and 
H2B-RFP* ES cells are compared with trophospheres formed with 

GFP* TS cells alone. Scale bar, 100 1m. Images are representative of three 
experiments. b, c, Quantification of embryonic inductions previously 
assessed in blastocysts. The schemes above the graphics depict blastoids 
or trophospheres. b, Trophoblast proliferation. Count for the initial 
number of TS cells seeded (see Extended Data Fig. 1a) and present after 
65h within blastoids (B) and trophospheres (T). Horizontal bars indicate 
the mean number of trophoblasts. ****P = 2.10~°, two-sided Student's 
t-test. n = 30 blastoids or trophospheres. Error bars indicate s.d. See 
Extended Data Fig. 10b. c, Trophoblast self-renewal. Colony formation 
unit (CFU) determined as the number of CDX2* colonies divided by the 
mean number of trophoblasts in blastoids or trophospheres. Horizontal 
bars denote mean CFU. ****P=4.10~°, two-sided Student's t-test. 


including mouse epiblast stem cells (EpiSCs, in vitro analogues of 
E5.75 embryos). EpiSCs induced the formation of fewer blastoids 
(fivefold) of smaller size (Extended Data Fig. 8a). Other cell types 
did not support trophectoderm development, demonstrating that 
specific inductions emerge from ES cells. FGF4, originating from the 
inner cell mass (ICM), regulates trophoblast proliferation and self- 
renewal*”>°, Consistently, ES cells expressed eightfold more Fef4 
mRNA than EpiSCs (Extended Data Fig. 8b), and induced tran- 
scriptional signatures for MAPK signalling activity and an enhanced 
cell cycle in blastoid trophoblasts (Extended Data Fig. 8c, d and 
Supplementary Table 1 sheet 7). Accordingly, there were twice as many 
trophoblasts in blastoids than trophospheres, and blastoid tropho- 
blasts formed more (fivefold) CDX2* colonies than trophosphere 
trophoblasts (Fig. 4b, c). We then tested whether morphogenesis of 
the epithelial cyst, an important transformation within blastoids, is 
functionally regulated by resident ES cells. Consistent with induction 
of trophectoderm morphogenesis genes, ES cell titration within 
blastoids increased both blastoid cavitation and diameter (Extended 
Data Fig. 8e, f). Finally, as the trophoblast state regulates the decidu- 
alization response!*'®, we transferred blastoids and trophospheres in 
utero. Consistent with their post-implantation transcriptome signa- 
ture, trophospheres had diminished potential for decidualization as 
compared to blastoids (sixfold, Fig. 4d). Thus, resident embryonic cells 
functionally maintain trophoblast proliferation and self-renewal, and 
prevent trophoblast differentiation into post-implantation placental 
cell types, while shaping their epithelial architecture and implantation 
potential. 

The TGFf signalling pathway is active in the blastocyst”>4. However, 
its functions are unknown owing to delayed detection of the defects 
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transferred with blastoids or trophospheres at E3.3-E3.5, and explanted at 
E7.5. n= 6 independent mice. Horizontal bars indicate mean percentage. 
** P — 0,003, Student’s two-sided t-test. Error bars indicate s.d. 

e, f, Morphogenetic functions of TGF activators. e, Yield (top) and 
cavity area (bottom) of blastoids formed with wild-type (WT), Nodalt!— 
and Nodal~'~ ES cells. Horizontal bars denote mean yield or cavity 

area. ***P= 2.104, two-sided Student's t-test. n = 3 independent 
microwell arrays. Error bars indicate s.d. f, Overall diameter of blastoids, 
trophospheres and trophospheres exposed to BMP4, Nodal or their 
combination. Horizontal bars indicate mean of nm = 250 independent 
structures. Error bars denote s.e.m. Combinations of BMP4 and 
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after loss-of-function mutations (E5.5-E6.5)”. This delay may reflect 
a developmental robustness rooted in functional redundancy of ligands 
(for example, Nodal-activin-BMP), plasticity of signalling pathways, 
and/or technical limitations of detection”. Also, the temporal overlap 
of lineage commitment and morphogenesis in early blastocysts limits 
the interpretation of compartment-specific inducible genetic models. 
The blastoid system overcomes these limitations by permitting (i) 
genetic and physical uncoupling of the trophoblast and embryonic 
compartments, and (ii) disentangling blastocyst morphogenetic pro- 
cesses from earlier lineage commitment events. 

In the blastoid, the embryonic cells induced TGF@ signalling pathway 
activity in trophoblasts (Extended Data Fig. 8c, d and Supplementary 
Table 1 sheets 6, 7). Also, the RNAs of the TGF® activators Bmp4 and 
Nodal are largely restricted to the embryonic cells (Extended Data 
Fig. 9a). We concluded that BMP4 and Nodal produced by the embry- 
onic compartment induce TGF activity in the trophectoderm. 

To assess loss-of-function in the context of other inductions 
(for example, FGF4), we generated ES cells with a heterozygote or 
homozygote Nodal knockout (Extended Data Fig. 8b). These cells 
had a decreased capacity to form blastoids, which had smaller cav- 
ities (Fig. 4e). Next, we explored TGF gain-of-function activators. 
Stimulation of trophospheres with BMP4 and Nodal regulated a similar 
number of genes (30% overlap). Combined, they upregulated expres- 
sion of the transcription factors Cdx2, Id2 and KIf6, induced WNT 
activity, which functionally regulates cavitation (see Supplementary 
Table 1 sheet 8 and Fig. 1b), and downregulated the expression 
of Cldn4 and Krt8, which are overexpressed in differentiated tro- 
phospheres (Extended Data Fig. 9c-f, Supplementary Table 2 sheets 
1-3). As for morphogenetic features, Nodal increased the cavitation 
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Fig. 5 | KLF6 mediates the response to BMP4 and Nodal, and 
trophoblast morphogenesis. a, Blastoid formation and stimulation of 
trophospheres with BMP4 and Nodal upregulates expression of KLF6, 
CDX2 and ID2 in trophoblasts. b, Yield of blastoids formed with wild-type 
and KIf6~/~ TS cells (two clones). Horizontal bars indicate mean yield. 
*P=0.01, two-sided Student's t-test. n = 3 independent microwell arrays. 
c, Cavity area of blastoids formed with wild-type and KIf6~'~ TS cells 
(two clones). Horizontal bars denote the mean cavity area. **P =0.005, 
two-sided Student's t-test. n =3 independent microwell arrays. d, Yield 
of trophospheres formed with wild-type and Kifo~'~ TS cells, and after 
stimulation with 45 ng ml”! BMP4 and 5 ng ml! Nodal. Horizontal 

bars indicate mean yield. *P = 0.02, **P=0.001, two-sided Student's 
t-test. n= 3 independent microwell arrays. Error bars in b-d denote s.d. 
e. Diameters of trophospheres formed with wild-type and KIf6~/~ TS 
cells, and after stimulation with 45 ng ml~' BMP4 and 5 ng ml! Nodal. 
Horizontal bars indicate mean diameter. ***P = 0.001, ****P=0.0001, 
two-sided Student's t-test. n = 80. Error bars are s.e.m. f, Immunostaining 
for E-cadherin (DECMA antibody) in blastoids formed with wild-type 
and Kif6~'~ TS cells. Images are representative of three independent 
experiments. Scale bars, 501m. 


of trophospheres and had a milder effect on their diameter, whereas 
BMP4 induced the opposite effect. BMP4 and Nodal together increased 
both the cavitation (Extended Data Figs. 9g, 10a) and diameter (120%, 
Fig. 4f) of trophospheres. As there was no change in cell numbers 
(Extended Data Fig. 10b), we concluded that BMP4 and Nodal regu- 
late cavitation and swelling, consistent with the regulation of epithelial 
components. Accordingly, TGF signalling pathway inhibition using 
LDN193189 reduced the size of the blastocoel cavity in blastocysts 
(Extended Data Fig. 10c). Thus, BMP4 and Nodal contribute to troph- 
oblast epithelial morphogenesis. 

Blastoid formation and the stimulation of trophospheres with BMP4 
and Nodal both upregulated K/fé in trophoblasts (Fig. 5a, Supplementary 
Table 2 sheets 1-3). Consistent with a continuum reflecting trophecto- 
derm and placenta development, trophospheres strongly expressed K/f6, 
which is somehow involved in early placenta development”’ (Extended 
Data Fig. 6g). We thus generated two KIf6/~ TS cells lines (Extended 
Data Fig. 10d) and analysed trophoblast morphogenesis. In 2D culture, 
the lines appeared similar to the parental line (morphology, proliferation, 
and levels of E-cadherin; Extended Data Fig. 10e). However, mirroring 
the ES cell Nodal knockout phenotype, both TS cell lines had a reduced 
capacity to form blastoids, which had a smaller cavities (Fig. 5b, c). 
This defect was partly TS cell autonomous, as their capacity to form tro- 
phospheres was also reduced (Fig. 5d, e). In addition, contrary to their 
parental line, Kif6~/~ trophospheres did not respond to TGFS activators 
(Fig. 5d, e, Extended Data Fig. 10f). KLF6 has been linked to epithelial 
functions and E-cadherin expression”®. Indeed, E-cadherin and Krt8 
were downregulated within Kif6~/~ blastoids (Fig. 5f, Extended Data 
Fig. 10g). Thus, K/f6 is an important target of BMP4 and Nodal, regu- 
lating the epithelial morphogenesis of the trophectoderm. 
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Fig. 6 | Embryonic inductions drive trophectoderm development 

and implantation. Left, trophectoderm proliferation and self-renewal. 
FGF4 largely originates from the embryonic compartment. IL-11 is the 
most abundantly expressed STAT regulator, and originates from both the 
embryonic and trophectoderm compartments. Together, FGF4 and IL-11 
regulate trophoblast proliferation and self-renewal, and CDX2 expression. 
Right, trophectoderm epithelial maturation and morphogenesis. BMP4 
and Nodal largely originate from the embryonic compartment and, besides 
maintaining the trophoblast stem-cell state’, regulate trophectoderm 
epithelial maturation and morphogenesis. Among WNT ligands'®, WNT6 
and WNT7B are expressed in the trophectoderm, and regulate epithelial 
morphogenesis. Altogether, signals originating from the embryonic cells 
fuel trophoblast proliferation, morphogenesis and generate a trophoblast 
state prone to in utero implantation. 


Assigning functional roles to compartment interactions in blasto- 
cysts is challenging owing to (i) the difficulty to form blastocysts with- 
out ICM cells (trophospheres)*®”’, (ii) the temporal overlap between 
lineage commitment and blastocyst morphogenesis, and (iii) the rel- 
ative speed of blastocyst development, which limits the interpretation 
of compartment-specific inducible models. Here, we describe the 
formation of blastoids, which morphologically and transcriptionally 
resemble E3.5 blastocysts, recapitulate key features of trophectoderm 
development, and implant in utero. This model overcomes many 
limitations of blastocyst research and proposes new mechanisms of 
embryonic inductions that drive trophectoderm development and in 
utero implantation (Fig. 6). 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized, and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Culture of ES cells. Experiments were done using the following cell lines: V6.5, 
H2B-RFP V6.5 sub-clone, PDGFRa-H2B-GEP, Sox17-GFP and IB10. The V6.5 cell 
line was derived from C57BL/6 x 129/Sv background and obtained from the labo- 
ratory of R. Jaenisch. The IB10 cell line was a subclone of the E14 cell line (129/Ola) 
and obtained from the laboratory of H. Clevers. The PDGFRa-H2B-GFP cell line 
was derived from an ICR background and obtained from the laboratory of A.-K. 
Hadjantonakis. A Sox17-GFP cell line was derived from unknown background 
and obtained from the laboratory of S. Morrison. All cells were cultured in 2i con- 
ditions, meaning on gelatin-coated plates in B27N2 medium complemented with 
leukaemia inhibitory factor (LIF, 10ng ml~'), PD0325901 (11M, AxonMed 1408) 
and CHIR99021 (31M, AxonMed 1386) as developed previously’. Cells were rou- 
tinely passaged every two days for 15 days before being used for blastoid formation. 
Culture of TS cells. Experiments were done using the Fy, F, and Cdx2-eGFP 
cell lines. The TS cell lines Fy (ICR x ICR) were obtained from the laboratory of 
J. Rossant. The F, (CBA x C57BL/6) and Cdx2-eGFP TS cell lines were derived 
by N. Rivron according to the methods described by previously”. Cdx2-eGFP TS 
cell lines were derived from a mouse reporter line in which eGFP is fused to the 
endogenous locus of Cdx2, as described previously’”. Cells were then cultured on 
Matrigel in phenol red-free TX medium, a serum-free medium developed pre- 
viously'?. TX medium contains DMEM/F12 (phenol red-free, with L-glutamin), 
L-ascorbic-acid-2-phosphate (641g ml~!), sodium selenite (14ng ml“), insulin 
(19.4jg ml~!), sodium bicarbonate (543 jig ml), holo-transferin (10.7 ug ml“), 
penicillin streptomycin, FGF4 (25 ng ml~'), TGFB1 (2 ng ml~!) and heparin (1 Lg 
ml~?). Alternatively, and when mentioned, TS cells were cultured in serum-rich 
conditions (TS medium) as described previously”. Cells were routinely passaged 
every 4 days before being used for blastoid formation. 

Culture of other cell types. Mouse EpiSCs were in cultured serum-free conditions, 
as previously described*’. Human ES cells were cultured in E8 medium. C2C12 
and COS7 cells were culture in DMEM (Gibco, phenol red-free with L-glutamine), 
supplemented with 10% fetal bovine serum (FBS) (Sigma-Aldrich). 

Microwell arrays. Microwell arrays were formed as previously described”* and 
inserted into 12-well plates or directly imprinted into 96-well plates. Each microwell 
array included 1,000 or 400 microwells, respectively, with a diameter of 200,1m, as 
described in Extended Data Fig. 1a. 

Culture of blastoids. The full protocol was repeated independently in the Hubrecht and 
MERIN institutes, and is available at Protocol Exchange” and on http://www.blastoid. 
org. ES cells were seeded by dispensing a cell suspension in DMEM medium (Gibco 
31966021) with non-essential amino acids (Gibco, 11140-050), 3-mercaptoethanol 
(Gibco, 21985-023), 10% FBS (Sigma-Aldrich) and LIF (10° U ml“), on top of the 
microwell arrays with a cell concentration resulting in the pooling of a mean of five 
cells per microwell. Within 24—36h, the ES cells formed tight, round aggregates. TS 
cells were then seeded on top of the ES cells aggregates at a concentration resulting 
in the pooling of a mean of 12 cells per microwell. Upon settling of the cells within 
the microwells, TX medium was added and complemented with Y27632 (201M, 
AxonMed 1683), CHIR99021 (31M, AxonMed 1386), 8Br-cAMP (1mM, Biolog Life 
Science Institute BOO7E), FGF4 (25ng ml~!, R&D systems 5846F4), TGFB1 (15ng 
ml 1, Peprotech 100-21), IL-11 (30 ng ml 1, Peprotech 200-11) and heparin (1 Lg ml). 
The time of TS cell seeding is considered as the starting point (Oh). Within 24h, the TS 
cells aggregated with the ES cells. At 24h, 1mM 8Br-cAMP was added to the medium. 
Within 48h, 2-8% of the aggregates formed a cavity, which expanded and stabilized by 
65h. A blastoid is defined based on the morphological parameters of E3.5 blastocysts, 
as a cystic structure with an outer circularity superior to 0.9 (circularity = 47(area/ 
perimeter’), and a diameter comprised between 70 and 110j1m, including a single 
regular cavity lined by a single layer of TS cells and including ES cells. Cystic struc- 
tures refers to all the TS-cell cavitated structures with a diameter of cavity greater than 
20m. WNT3A-conditioned medium was obtained from the laboratory of H. Clevers; 
WNT3A recombinant protein was obtained from Cell Guidance Systems. XAV939, 
a tankyrase inhibitor (Tocris Bioscience), is used as an antagonist of WNT signalling 
and acts via stimulation of 3-catenin degradation and stabilization of axin; XAV939 
was used at 151M. 

Trophospheres were obtained by seeding TS cells (mean of 12 cells per microw- 
ell) in the same medium as used for blastoids. TX medium without growth factors 
(blank medium) refers to medium containing DMEM/F12 (phenol red-free, 
with L-glutamine), L-ascorbic-acid-2-phosphate (64 1g ml~!), sodium selenite 
(14ng ml~?), insulin (19.4;.g ml~), sodium bicarbonate (543 .g ml”), holo-trans- 
ferin (10.7 pg ml~') and penicillin streptomycin. 

TCF luciferase reporter assay. A TOP/TK-Renilla reporter plasmid system was 
used for the detection of B-catenin driven Wnt-transcriptional activity, as pre- 
viously described**. The TOP reporter construct contains three optimal copies 
of T cell factor (TCF)/lymphoid enhancer factor (LEF) transcription factor sites 


upstream of a thymidine kinase minimal promoter that, when bound by {-catenin 
induces transcription of the luciferase reporter gene. The Renilla reporter construct 
contains thymidine kinase-Renilla luciferase (TK-Renilla), which drives strong 
WNT-independent activity of the Renilla gene and serves as a measure of cell 
viability. Luciferase reporter gene analysis was performed in TS cells cultured in 
2D: 16,000 cells were seeded into each well of a 96-well plate in TX culture con- 
dition. After 24h, the cells were transiently co-transfected using Lipofectamine 
3000 (Invitrogen, L3000001). Cells were stimulated 16h later with TX or blastoid 
culture medium and the Porcupin inhibitor [WP2 (2.51M). TX without growth 
factors was used as a negative control and WNT3A-conditioned medium was used 
as a positive control. After 24h, cells were lysed using passive lysis buffer and lucif- 
erase and Renilla activity was measured with the Dual Luciferase Reporter Assay 
System (Promega, E1910). Each condition was performed in quadruplicate and 
the reporter activity was expressed as mean + s.d. 

Screening for regulators of Cdx2 in TS cells. The TS cell line used for this assay is 
homozygous for Cdx2-eGFP and seeded at 20,000 cells per cm’ on Matrigel-coated 
plates, in TX medium including 25 ng ml“! FGF4 and 2ng ml’ TGF@1. After 
24h, the medium was replaced by TX medium without FGF4 and TGF1 (blank 
medium) and candidate molecules were added. PD0325901 (11M) was used as 
a negative control to downregulate CDX2 expression (see Extended Data Fig. 2). 
Cells were assessed for eGFP expression 48h after the addition of the candidate 
molecules, using a BD FACSCalibur. Data were analysed using FlowJo. The basal 
level of Cdx2-eGFP expression was defined by gating the population of TS cells 
cultured for 48 h in blank medium (see Extended Data Fig. 3). 
Immunohistochemistry. Immunofluorescence was performed as described 
by the laboratory of J. Rossant (http://lab.research.sickkids.ca/rossant/lab- 
resources/). Antibodies were used against CDX2 (MU392A-UC; 1:400 dilution), 
Nanog (Abcam ab84447; dilution 1:200), OCT4 (Santa Cruz 5279; 1:200 dilu- 
tion), GATA6 (R&D Systems AF1700; 1:200), PDGFRa (R&D Systems AF1062; 
1:150 dilution), ELF5 (Santa Cruz sc-9645; 1:250 dilution), MASH2 (also known 
as ASCL2; Genetex GTX60272; 1:250 dilution), TEAD4 (Abcam ab58310; 1:400 
dilution) and proliferin (E10, Santa Cruz sc-271891; 1:250 dilution), TPBPA 
(Abcam ab104401; 1:250 dilution), placenta lactogen (P17, Santa Crux, sc-34713; 
1:200 dilution), HAND1 (Abcam ab115256; 1:250 dilution), and E-cadherin 
clone DECMA-1 (Sigma U3254; 1:500 dilution). All images were taken using a 
PerkinElmer Ultraview VoX spinning disk microscope combined with a Leica SP8. 
Derivation of de novo ES and TS cell lines from blastoids and injection into 
blastocysts. Blastoids cultured for 65h and freshly isolated E3.5 blastocysts (E3.5 
CBA x C57BL/6) were individually transferred into wells of 96-well plates con- 
taining mouse embryonic fibroblasts. TS cell lines were derived in TS medium 
containing 20% FBS, 25ng ml’ FGF4, 2ng ml! TGFB1 and 1g ml heparin, as 
described previously”. ES cell lines were derived in B27N2 medium supplemented 
with 10ng ml“! LIF, 14M PD0325901 and 31M CHIR99021 (2i medium). The 
derivation was considered successful if colonies appeared on passage two after 
blastocyst or blastoid plating. ES or TS cells (12-15 cells) were injected into E3.5 
blastocysts from C57BL/6 mice. The injected blastocysts were transferred into the 
uterus of CBA x C57BL/6 pseudo-pregnant females. Embryos were explanted from 
the uterus and dissected from the deciduae at E6.5. 

Uterus transfer. Pseudo-pregnant F, females were selected for oestrous and 
placed with vasectomized males in the evening. Mice were considered to mate at 
midnight, and the next morning, plugged females were separated. Blastoids and 
trophospheres were cultured and selected as described above, and 25 blastoids 
or trophospheres were injected into both horns of the uterus of day E3.3-E3.5 
pseudo-pregnant Bl6/CBA females. The blastoids and trophospheres were trans- 
ferred on the ovary side of the uterus, using a medium containing LIF. Uteri were 
explanted on day E6.5 or E7.5, fixed in 4% paraformaldehyde and processed for 
paraffin embedding and histology. 

qRT-PCR. RNA was obtained using TRIzol reagent (Thermo Scientific 15596-018) 
extraction method. Retrotranscription was performed using the Superscript III kit 
(Invitrogen 18080-044). Quantitative PCR was performed using IQ SYBR Green 
Supermix (Bio-Rad 1708880) and 30 ng of RNA using the following gene-specific 
primers: Cdx2: 5'-AAAGTGAGCTGGCTGCCACACTTG-3’, 5’-TCCATC 
AGTAGATGCTGTTCGTGG-3’; Tefap2c: 5’-GAGGTGCAGAATGTGGACGA-3’, 
5'-CCCCAAAGGGTTCTTGGTCA-3’; Tead4: 5’-TTGAGCGAAGCTT 
CCAGGAG-3’, 5/-TTCCGACCATACATCTTGCCT-3’; Gata3: 5'-GCTCCTTGC 
TACTCAGGTGAT-3’, 5’-GGAGGGAGAGAGGAATCCGA-3’; Eomes: 5/- 
TGATCATCACCAAACAGGGC-3’, 5/-ACTGTGTCTCTGAGAAGGTG-3'; Elf: 
5!-TTCGCTCGCAAGGTTACTCC-3’, 5’- TETTCGGCTGTGACAGTCTT-3’; 
Hand1: 5'-GCCTACTTGATGGACGTGCT-3’, 5’/-TGCTGAGGCAACTCCCT 
TTT-3'; Cdh1: 5’-CCAAGCACGTATCAGGGTCA-3/, 5/-ACTGCTGGTCAGGAT 
CGTTG-3’; Id2: 5'-CCTGCATCACCAGAGACCTG-3’, 5/-GGGAATTCAGATGC 
CTGCAA-3’; Sox2: 5‘-GATCAGCATGTACCTCCCCG-3’, 5’- CTGGGCCA 
TGTGCAGTCTAC-3’; Actb 5'-TGTCGAGTCGCGTCCACC-3’, 5’-TCGTCAT 
CCATGGCGAACTGG-3’. 
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RNA sequencing. Blastocysts were flushed from the uterus of CBA x C57BL/6 
E3.5 mice and sorted as E3.25 or E3.5 blastocysts according to their phenotype: 
early blastocysts had a smaller number of cells, a smaller diameter, a smaller blas- 
tocoel and a more prominent ICM than late blastocysts. Blastoids were formed 
using F, wild-type or GFP* TS cells and V6.5 wild-type or H2B-RFP ES cells. For 
the bulk sequencing, blastocysts or blastoids (n = 50) were pooled into TRIzol. 
The transcriptional profile of virtual blastoids was determined by combining in 
silico, at a 1:3.5 ratio, ES and TS cells cultured in 2D, in blastoid medium, for 65h. 
For single-cell sequencing, blastoids or Tyrod’s acid treated, zona-free, blastocysts 
were sequentially exposed to 0.3mg ml! collagenase IV (Gibco, 17104019) and 
a 1:3 dilution of Tryple Express Select X10 (Thermofisher A1217701), and sub- 
sequently dissected with glass capillaries of different diameters ranging from 100 
to 201m. Single cells were FACS-sorted into 384-well plates as described previ- 
ously**, For this analysis, we analysed a total of 1,577 cells including 290 parental 
TS cells cultured in 2D in TX medium; 316 parental ES cells cultured in 2D in 2i 
medium; 367 blastoid cells; 336 TS cells cultured alone in blastoid medium within 
microwells, which formed trophospheres; 268 ES cells cultured alone in blastoid 
medium within microwells, which formed embryoid bodies; and 60 blastocyst cells 
extracted from three different E3.5 blastocysts. 

CEL-Seq library preparation. For the bulk sequencing experiment (Fig. 3a), the 
whole RNA from blastocysts and blastoids was extracted from TRIzol. For cell 
lines, 20 ng of total extracted RNA was used as starting material, as measured by 
Qubit RNA assay (Life Technologies). All samples were processed using CEL- 
Seq2 protocol™4, the CEL-Seq1 primers*® and the Life Technologies Ambion kit 
(AM1751) as previously described*®. For the single-cell sequencing experiments, 
cells were processed according to the SORT-seq method, as previously described°? 
and clustered using the Race-ID method, as previously described**. t-SNE map 
representation of transcriptome similarities was based on 1,577 cells collected from 
parental ES cells cultured in 2D in 2i medium (316 cells), parental TS cells cultured 
in 2D in TX medium (290 cells), blastoid cells (367 cells), TS cells cultured in blas- 
toid medium within microwells (that is trophospheres, 336 cells), ES cells cultured 
in blastoid medium within microwells (that is embryoid bodies, 268 cells). Libraries 
were sequenced on an Illumina Nextseq 500 using 75-bp high output paired-end 
sequencing. Transcription factors were selected according to the Riken transcrip- 
tion factor database, and lowly expressed ones were filtered out before plotting the 
heat maps. Differential expression analysis was done using the DESeq package*”, 
and pathway enrichment analysis was done using DAVID** and Gorilla”. 

Gene set enrichment analysis. Analysis was performed according to the standard 
procedure of gene set enrichment analysis (GSEA) analysis (http://software.broa- 
dinstitute.org/gsea/). The gene set was established by extracting genes that were 
differentially regulated between the cells of the trophectoderm of blastocysts (27 
cells) and TS cells (33 cells). The pre-ranked list of genes differentially expressed 
between the trophoblasts of blastoids (59 cells) and TS cells (33 cells) was estab- 
lished by extracting all the genes that were differentially regulated between clusters 
established using the RaceID clustering. These populations were defined based on 
the RaceID clustering. The gene set consisted of 281 genes differentially regulated 
and with P< 0.05. 

Curated gene sets. Curated gene sets were used to identify, within the list of GSEA 
enriched genes, genes related the TGF( signalling pathway or to epithelial cells 
(highlighted in red and green, respectively, in Supplementary Table 1 sheet 5). The 
gene set for the TGF pathway was composed of the GSEA gene sets: reactome 
signalling by TGFB receptor complex; KEGG TGF@ signalling pathway; Biocarta 
TGF pathway, and PID SMAD2/3 nuclear pathways. The gene set for the epithelial 
cells was composed of the GO gene sets: GO:0002066; GO:0002065; GO:0090136; 
GO:0002064; GO:0030855; GO:0072148; GO:0002070; GO:0003382; GO:0050673; 
and GO:0060429. 

Whole-mount single-molecule FISH. The Affymetrix ViewRNA ISH Cell Assay 
Kit was used to perform whole-mount single-molecule FISH on freshly isolated 
E3.5 blastocysts. The probe used was designed to hybridize against Id2 (catalogue 
VB6-10967), Bmp4 (catalogue VB1-13681), Nodal (catalogue VB6-18786), and Il11 
(catalogue VB6-19190-VC) mRNA. Blastocysts were imaged in steps of 0.5-1m 
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using a 63 x objective on a PerkinElmer Ultraview VoX spinning disk microscope 
combined with a Leica SP8. 

BMP4 and Nodal screens. The screen was performed by adding proteins within 
the blastoid medium at both 0 and 48h. Images were acquired over 800 microw- 
ells per condition. The yield of blastoids (Fig. 4b) is measured as described in 
the ‘Culture of blastoids’ section. The yield of cystic structures (Figs. 4d and 5) is 
measured as the percentage of microwells that contained a structure with a cavity 
that has a maximal diameter >20 1m. The diameter of cystic structures (Fig. 4e) 
is measured by image analyses using the microscope measurement tool, Fiji, by 
measuring the larger diameter of the larger cavity within each microwell. The 
circularity is measured using Fiji, as follows: (circularity = 47(area/perimeter’)). 
Combinations of BMP4 and Nodal were: 5ng ml-! BMP4+ 5 ng ml“! Nodal; 5ng 
ml-! BMP4 +45 ng ml“! Nodal; 45 ng ml"! BMP4+ 5ng ml“! Nodal; 45 ng ml“! 
BMP4+45ng ml! Nodal. 

CRISPR knockout. Two guide RNAs (gRNAs) were designed to flank the first exon 
of Nodal or the first two exons of the K/f6 gene. ES and TS cells were transfected 
with a plasmid for expression of Cas9 protein, gRNAs and the puromycin-resist- 
ance gene. ES cells were transfected in 2D feeder-free conditions; TS cells were 
transfected in suspension for 4h. Both ES and TS cells were subsequently plated 
on mouse embryonic fibroblasts. Puromycin was added 24h after transfection 
to the culture media to apply selection during the following 48h. Colonies were 
allowed to grow until they were able to be picked, then screened for deletion of 
the region of interest. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Data generated during the study are available in the 
Supplementary Tables and in the Gene Expression Omnibus (GEO) public repos- 
itory under accession GSE99786. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | In vitro formation of blastoids. a, Non- 
adherent hydrogel microwell arrays were formed by replica molding 
using polydimethylsiloxane (PDMS) stamps as previously described”*°. 
The array used for 12-well plates contains 1,000 cylindrical structures 

of 200 1m diameter and height. Upon cell seeding, owing to the non- 
adherent properties of the hydrogel, all cells slide into the microwells. 
Upon settling, the number of ES cells per microwell follows a distribution. 
Mean number per microwell: 5.2; half of the microwells contained between 
4 and 6 cells per microwell (right, top). Upon aggregation of ES cells 
(24-36 h depending on the cell line), TS cells were seeded. Mean number 
per microwell: 12; half of the microwells contained between 10 and 14 
cells (right, bottom; time point 0h). b, After culture in blank medium 
(see Methods), TS and ES cells mostly arranged into non-organized 
structures forming trophoblasts cysts. TS cells cultured without ES cells 
(65h) formed fewer cystic structures than a co-culture. Rarely, TS cells 
enclosed ES cells and formed regular cystic structures morphologically 
similar to the blastocyst (white arrow). n = 4 independent microwell 
arrays. Structures shown in b were taken out of the microwell array at 
65h. Scale bars, 100 um. ¢, d, Optimization of the engulfment of ES cells 
by TS cells. c, Dosing the number of TS cells. ES cells were seeded at 
t=—24/36h. At t=0h, different numbers of TS cells were seeded on 

top of the ES cell aggregates. After 24h, the engulfment efficiency was 
defined by measuring the percentage of coverage of TS cells around ES 
cells. The most efficient yields were observed when more than 12 TS 
cells were added to ES cell cultures. n = 250 independent microwells. The 
centre value is the mean, error bars are s.d. d, Left, optimization of the 
concentration of Y27632. At t= 0h, different concentrations of Y27632 
were added. At t= 24h, the engulfment efficiency was measured as the 
percentage of coverage of TS cells around ES cells. The most efficient 
yields were obtained with 201M Y27632. n= 250 independent microwells. 
The centre value is the mean. Right, optimized engulfment. Images of ES 
cells (red) engulfed by TS cells (green) using the optimized conditions 


LETTER 


(mean of 12 TS cells per microwell and 20 1M Y27632, 80% engulfment 
efficiency). Images are representative of three independent experiments. 
Scale bar, 100 jm. Error bars are s.d. e, Blastocoel area of blastocysts 
formed from E2.5 CBA x C57BL/6 morula, selected for initiation of 
compaction and cultured in M16 medium for 24h along with antagonists 
of WNT (XAV939, 151M), PKA (H89, 101M) or DMSO (1:1,000, 
control). n= 10 independent blastocysts. P= 0.015 and P= 0.002, two- 
sided Student’s t-test. The centre values are medians. Errors bars are s.d. 
f, RNA-sequencing data for Wnté and Wnt7), in E3.25 and E3.5 
blastocysts, TS cells (TSC) and ES cells (ESC). TS cells were cultured 

in serum-free (TX) or serum-rich (TS) medium (see Methods). 
Differentiation was induced by the removal of growth factors. g, TCF 
luciferase assay for WNT activity in TS cells (see Methods). WNT 
secretion was blocked using IWP2 (2.5 1M). n= 4 independent biological 
samples of TS cell culture. *P = 0.045, **P = 0.0057, two-sided Student’s 
t-test. The centre values are medians. Errors bars are s.d. h, Induction of 
cavitation. Blastoids were defined as described in the Methods. ES and TS 
cells were seeded in serum-free TX medium including Y27632 (201M) 
and a WNT modulator. WNT3A-conditioned medium (50% of the total 
volume), CHIR99021 (31M), or the combination of CHIR99021 (3 1M) 
and XAV939 (151M) was added at the time of TS cells seeding (t=0h). 
n= 3 independent microwell arrays. *P = 0.017, two-sided Student's t-test. 
The centre values are medians. Errors bars are s.d. i, Yield of blastoids 
depending on the initial ratio of TS to ES cells, at t= 0h, within individual 
microwells (left). Yield of blastoids formed using three lines of TS cells 
representative of the scope of efficiency observed (right). Different lines 
were isolated upon CBA x C57BL/6 matings (F1.1, F)-2, derived by N.R.) 
and ICR x ICR matings (F4, provided by J. Rossant). n =3 independent 
microwell arrays. The centre values are the mean. Error bars represent s.d. 
The red line represents the median of the three cell lines (5% of the total 
number of microwell per array). j, Bright-field images of a representative 
E3.5 blastocyst (top) and a blastoid (bottom). Scale bar, 50 zm. 
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Extended Data Fig. 2 | See next page for caption. 
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Extended Data Fig. 2 | Regulation of trophoblast cells. a, The number Stat3 transcripts per million reads as measured in bulk samples of ES and 
of Cdx2 transcript reads per million mapped reads (r.p.m.) as measured TS cells and blastocysts. d, Immunostaining for phosphorylated STAT3 in 
in bulk samples of: ES cultured in 2i medium and TS cells cultured a representative blastocyst colony of ES and TS cells (see Methods). Scale 
in TX medium, TS cells cultured in blastoid medium, and blastoids bar, 50m. e, TS cells cultured in the presence of different concentrations 
and blastocysts. Note that the blastoids and blastocysts comprise both of the STAT/GP130 pathway inhibitor SC144. Both growth and viability 
trophectoderm and ICM cells, the latter expressing only very low were affected by concentrations of at least 11M. f, The number of 

levels of Cdx2. n = 2 independent biological samples (see Methods). transcripts per million reads for the STAT pathway ligands I/11, Lifand 

b, Measurement of CDX2 fluorescent intensity in blastoids and II6, along with their receptors, as measured in bulk samples of ES cells, TS 
blastocysts. Blastoids were fixed and stained with an anti-CDX2 antibody cells and blastocysts. g, Whole-mount single-molecule FISH for J/11, in an 
(see Methods). E3.5 blastocysts were used as a positive control. n= 15 E3.25 blastocyst. Scale bar, 50 j1m. See Methods for further details. 


independent blastoids or blastocysts. Error bars are s.d. c, The number of 
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Extended Data Fig. 3 | See next page for caption. 
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Extended Data Fig. 3 | Screen for regulators of CDX2 expression in TS 
cells. a, Calibration of the cell lines. The assay is performed using 
CDX2-eGFP?* TS cells and as described in the schematic and the Methods. 
For the initial calibration of the assay, gating was set so that wild-type TS 
cells do not appear in the gate (non-specific fluorescence, left FACS plot). 
In that condition, after 48 h, the following appeared in the gate: 50% 

of the CDX2-eGFP* cells cultured in blank TX medium; 88% of the 
CDX2-eGFP* TS cells cultured in TX medium including FGF4 (25 ng ml) 
and TGF$1 (2ng ml~!); 10% of the CDX2-eGFP* TS cells cultured 

in TX medium including FGF4 (25 ng ml~!), TGFG1 (2ng ml~!) and 
PD0325901. Biological triplicates show very minor variability (n = 3 
independent biological samples, right). Error bars are s.d. Scale bar, 

100 jm. b, Calibration of the assay and primary screen. For the primary 
screen of proteins and small molecules, the condition with TX medium 
including FGF4 and TGF(1 was used as a positive control. The gating was 
set up such that 50% of these cells appear in the gate. The condition with 
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TX medium without FGF4 and TGF 1 was used as a negative control, 
and 20% of these cells appear in the gate. 8Br-cAMP (0.04 to 5mM), 
IL-11 (4 to 500 ng ml~’), LIF (3 to 375 ng ml“'), BMP4 (1 to 125ng ml“), 
and IGF2 (1 to 125 ng ml') were added to the medium. The value is the 
measurement of a single sample. The typical s.d. for this assay is shown 

in a. c, Secondary screen: combinations of hits. 8Br-cAMP (1 mM), IL-11 
(30 ng ml~!), or 8Br-cAMP (1 mM) +IL-11 (30 ng ml!) were added to 
TX medium including FGF4 and TGF. The full blastoid medium was 

also tested (see Methods). d, e, Markers of multipotency, differentiation 
and epithelization in stimulated TS cells. TS cells were grown for 48h in 
blank medium (see Methods), TX medium (including FGF4 and TGFB1) 
or TX medium supplemented with 8Br-cAMP and IL-11. Representative 
bright-field images of TS cells grown for 48h in TX medium or in blastoid 
medium are shown (d), along with gene expression as characterized using 
bulk RNA sequencing (e, n = 3). Scale bars, 1,000 1m. 
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Extended Data Fig. 4 | Evolution of the number of cells in the blastoid 
compartments. a, E3.5 and E4.5 blastocysts (n = 6) were immunostained 
using Nanog and PDGFRa antibodies, and nuclei were counterstained 
with DAPI (see Methods). b, Percentage of blastoids including GATA6T 
cells from a total number of 14 blastoids. c, Number of cells in each of 
the blastoid or blastocyst compartments, namely trophoblast, embryonic 
and primitive endoderm, were counted based on an immunostaining 


for Nanog, GATA6 and PDGEFRa. The number of trophoblast was 
counted based on the expression of constitutive GFP. The quantification 
was done at 48, 72 and 86h. Note that, for most experiments presented 
herein, blastoids are harvested at 65h (see Methods). The horizontal bars 
represent the mean number of cells. The coefficient of variation (relative 
s.d.) is 19.4% for E3.5 blastocysts and 23.6% for blastoids at 48h. n=6 
independent E3.5 blastocysts and E4.5 blastocysts. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | De novo derivation and developmental potential 
of blastoid stem cells. a, De novo TS and ES cell lines were derived from 
CDX2-high and CDX2-low blastoids, and blastocysts (see Methods). The 
derivation was considered successful if colonies appeared on passage 2 
after blastocyst or blastoid plating. The derivation efficiency of TS cells 
from CDX2-high blastoids was significantly higher (****P = 0.0001, 
Student's t-test) than CDX2-low blastoids and similar to blastocysts 

(left). n = 22 independent blastoids. Error bars are s.d. The potential 

of de novo-derived cell lines to chimaerize embryos was assessed by 
immunohistochemistry, using an anti-GFP antibody targeting GFP* 

TS cells or an anti-RFP antibody targeting H2B-RFP* ES cells (middle). 
Autofluorescence (denoted by an asterisk), probably due to the presence 
of blood, occurs within the ecto-placental cone of chimaeric embryos 

but is lower than the fluorescence recorded from chimaeric embryo 

after the injection of GFP* TS cells (right). The top image shows a 
chimaeric embryo after the injection of GFP* TS cells. The bottom image 
shows a chimaeric embryo after the injection of H2B-RFP* ES cells. 

Both embryos are imaged in the GFP channel. Note the presence of a 


small patch of autofluorescence (denoted by an asterisk) in the bottom 
image. Microscope settings were adjusted to limit the acquisition of 
autofluorescence, and images taken upon excitation with a 450-490 nm 
(green) laser. Scale bars, 100,1m. This experiment is a representation of 
three similar experiments. b, Immunohistochemistry using antibodies 
against CDX2, ELF5, TEAD4, ASCL2, HAND1 and proliferin, within 
tissue sections of a representative deciduae including a blastoid grown 

in utero. Scale bar, 100,1m. This experiment is a representation of 

three similar experiments. c, Percentage of implantation sites in uterus 
transferred with blastoids and vehicle medium at E3.3-E3.5, and 
explanted at E7.5.n=5 mice. The value is the mean. Error bars denote 

the s.d. **P=0.01, Student’s t-test. d, Uterus explanted at E7.5 after a 
physiological development (blastocyst-induced), the transfer of blastoids 
only in the left horn (blastoid-induced, also see the same uterus in Fig. 2h), 
and the instillation of oil (oil-induced). e, Histological sections of deciduae 
produced by blastocysts, blastoids and oil instillation, and stained using 

an anti-ALDH3A1 antibody. Scale bar, 100|1m. This experiment is a 
representation of three similar experiments. 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | Bulk and single-cell RNA sequencing of blastoids 
and blastocysts. a, b, Bulk RNA sequencing. a, Transcription factor 
expression distance map. Differentially expressed transcription factors 
between the ES cells, TS cells, blastoids, virtual blastoids (see Methods), 
E3.25 and E3.5 blastocysts (P < 0.05, DESeq negative binomial 
distribution) were used to generate a non-supervised distance map of 
transcription factors selected according to the Riken transcription factor 
database. The scale is log. b, Gene ontology (GO) and KEGG pathways. 
Genes differentially regulated (P < 0.05, DESeq negative binomial 
distribution) were analysed using DAVID*® and corresponding GO 

and KEGG pathways are presented. The full lists of genes related to the 
depicted GO terms are in Supplementary Table 1 sheet 1. c—e, Single-cell 
sequencing. c, Schematic depicting the origin of the single cells. 

t-SNE maps of single cells from ES cells, TS cells, blastoids, trophospheres 
and embryoid bodies. The colours represent the origin of cells assessed 
by FACS sorting indexes (Supplementary Table 1 sheet 8). d, Clusters 

of similar cells, as generated by the RACE-ID protocol**. The heat map 
on the right is the distance map of single cells. The clusters from the t- 


SNE map and from the heat map are identified using the same colour 
code. Cells were processed as described in the Methods. e, The Krt18 

and Oct4 genes are markers of the blastocyst trophectoderm and ICM 
compartments, respectively*!, which we confirmed are also valid to mark 
the compartments of blastoids using immunohistochemistry (KRT8/18 
and OCT4) and H2B-RFP* ES cells. This image is representative of three 
independent blastoid experiments. f, t-SNE map representation of key 
genes for the trophectoderm, trophoblast differentiation and placental cell 
types. g, Heat map showing expression of genes of interest: markers of the 
trophectoderm and ICM compartments by Krt18 and Oct4, as identified 
in e; markers of pluripotency and differentiation, previously identified in 
ES cells and blastocyst cells via single-cell sequencing”; genes previously 
identified as markers of WNT/(-catenin targets regulating a naive state 

of pluripotency*®; transcription factors upregulated in totipotent cells“; 
genes previously identified as differentiation markers in ES cells via single- 
cell sequencing”; genes previously identified as markers differentiating ES 
cells from epiblast stem cells*°; markers of trophoblast differentiation; and 
markers of trophectoderm morphogenesis. The scale is logio. 
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Extended Data Fig. 7 | Immunohistochemistry of epithelial markers c, Antibody against the tight junction molecule ZO-1. Section of 0.5 4.m 
within blastoids. a, Cross-section of a blastoid stained with an antibody (top). Maximum projection of 4 images taken with a 0.5-1m step (bottom). 
against E-cadherin. Maximum projection of 5 images taken with a d, Antibody against the apical molecule PKCz and Hoechst staining of 
1-1m step. b, Phalloidin staining of the cytoskeletal molecule F-actin DNA. Section of 1 xm (top); close up (bottom). All pictures were taken 
and Hoechst staining of DNA show the localization of actin at cell-cell with a PerkinElmer Ultraview VoX spinning disk microscope combined 
junctions. Maximum projection of 40 images taken with a 1-|1m step. with a Leica SP8. Images are representative of three experiments. 
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Extended Data Fig. 8 | Assays of embryonic and trophectoderm 
compartments interactions. a, Specificity of the ES cell inductions. Yield 
of formation and representative images of arrays of blastoids resulting 
from the association of TS cells with ES cells, EpiSCs, human ES cells, 
COS7 or C2C12 cells. n = 3 independent microwell arrays. Centre depicts 
median values. Error bar denote s.d. b, Number of Fgf4 mRNA transcripts 
measured by RNA sequencing in TS cells, ES cells, EpiSCs and XEN 

cells (extra-embryonic endoderm cell lines). c, Inventory of signalling 
pathways induced in trophoblasts by the embryonic cells: KEGG pathways 
differentiating the trophoblasts from blastoids and trophospheres. The 
list is exhaustive and generated using the list of statistically differentiated 
genes (P< 0.05, DESeq negative binomial distribution) with the highest 
fold changes (the 1,500 most highly upregulated and most highly 
downregulated genes). See the SORT-seq method described previously*’. 
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d, Selected KEGG pathways and genes related to the MAPK signalling 
pathway, TGF® signalling pathway, cell cycle, focal adhesion and hippo 
pathway. e, Trophectoderm morphogenesis. Blastoids were assessed based 
on our definition of cavitated trophoblast structures comprising ES cells, 
with a circularity greater than 0.9 (circularity = 47(area/perimeter’)), 
and a diameter between 70 and 110\1m (see Methods). Yield of blastoids 
(percentage per microwell array) as a function of the initial mean number 
of ES cells per microwell array. Horizontal bars denote mean yield. Error 
bars indicate s.d. **P=0.01, ***P=0.001, one-way analysis of variance 
(one-way ANOVA) and Tukey’s test. n = 8 independent microwell arrays. 
f, The diameter of blastoids and trophospheres measured at 65h. n=50. 
8k D — ().0001, two-sided Student's t-test. Centre values depict median. 
Error bar denote s.d. Representative images are shown in Fig. 4a. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a Single molecule FISH of TGFb pathway components 


Nodal mRNA 


Hoechst Id2 mRNA 


E3.5 blastocysts 


Bmp4 mRNA 


Hoechst 


b mRNA 
CDS 
@ gRNA docking site 


Qa 


Wnt pathway components 


Cc Differentially regulated genes upon Nodal, Bmp4 or Nodal + Bmp4 stimulation 
Differentially regulated genes (p<0.05) 


average ofn=3 p-values < 0.03 
Processes regulated by Bmp4 
- Regulation of transcription and translation 


3 
: lain. 
1 

~ Regulation of protein localization to the plasma membrane ° 


= Regulation of focal adhesion F2d7 Tof7i2 Wnt6 Gsk3bAxint Ctnnb1 Dkkl1 
- Fz 


Gene expression, fold change 


Processes regulated by Nodal . 
vfRegulation of translation Smad/B-catenin targets 
- Epigenetic regulation (acetylation, methyltransferase) average ofn=3 p-values < 0.03 


= Wat signaling 


= Regulation of apical-basal polarity and focal adhesion 
Ctgf Smad2 Msx2 Smad4 


s 


Processes regulated by Bmp4 + Nodal 
- Regulation of gene expression and translation 
- Chromatin modification 

- Regulation of focal adhesion 


e 


Gene expression, fold change 
n 


° 


=> > \d2, P-value = 5E-O7 1d2, P-value = 0,0002 \d2, P-value = 2E-10 
€ 600. Id2 2 ‘value = valu ” 
& 400. 2 om Le * 
Hy Hy : 
= 5 “ e 
= 200. = 
5 5 
3 3 ; 
5 0 5 
io 888 3 % a = Trophosphere —_+ Nodal Trophosphere + Bmp4 Trophosphere + Nodal + Bmp4 
"9 8 338 
woes & 
$e 
ES os 
g a 
1 Cldn4 
f e g Trophectoderm cavitation 
= 1000 + BMP4 
S 
3 80 @ CO OxBme4 — O+Nodal — © 7 Nodal 
= 500 | 
3 * 
5 Bm 
z i) 
& 10000 Krt8 
= e000 zg 70 ~ 1} | | | 
= 6000 2 | 
. 5 
‘5 4000 o=7 
5 g 
2 2000 S 
; ri || 7 
2 0 {mm S 60 
@ 1 
o 
E 6000 Krt18 Pach 
a lo} 
= z 
4000 o 
Fa = 
E 
2 2000 50) 
5 
3 
E 
2 0 
Bio pe ese BOT —— — 
a F # Se i ; 17 515 45 135 17 515 45 135 + BMP4 
co we 
& +BMP4 + Nodal seodan 


Extended Data Fig. 9 | See next page for caption. 
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Extended Data Fig. 9 | Markers, transcriptional and morphogenetic 
functions of TGF@ signalling. a, Single-molecule FISH for Id2, Bmp4 and 
Nodal genes of the TGF$ pathway, in an E3.5 blastocyst counterstained 
with Hoechst. Images are representative of five independent blastocysts. 
Scales bars are 30 1m for Id2 and 50 jum for Nodal and Bmp4 b, CRISPR 
strategy for the generation of Nodal deletion in ES cells. c, Venn diagram of 
all genes regulated after exposure to BMP4, Nodal or both (synergy). RNA 
sequencing showed that BMP4 and Nodal regulated a similar number of 
genes (BMP4: 904 genes, Nodal: 926 genes, genes with P < 0.05 (DESeq 
negative binomial distribution), of which 30% overlapped (413 genes). 
The GO analysis is shown for each group. See also Supplementary Table 2 
sheets 1-4. d, RNA-sequencing analysis of the WNT-related genes in 
trophospheres stimulated with activators of the TGF signalling pathway. 
All genes are significantly regulated (P= 0.03, DESeq negative binomial 
distribution). This included the ligand Wnté (1.6-fold, P=0.006, DESeq 
negative binomial distribution), which is expressed primarily in the cells 
surrounding the blastocyst cavity’®, the corresponding receptor Fzd7 (2.8- 
fold, P=0.002, DESeq negative binomial distribution), the intracellular 
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effector Tcf4 (also known as Tcf712; 1.7-fold, P= 0.001, DESeq negative 
binomial distribution), the negative-feedback regulator Axin1 (1.5-fold, 
P=0.005, DESeq negative binomial distribution, top), and of the reported 
cooperative SMAD/6-catenin targets Msx2 (1.6-fold, P= 0.00003, DESeq 
negative binomial distribution) and Crgf (3.6-fold, P=0.003, DESeq 
negative binomial distribution)** (bottom). See also Supplementary 
Table 2 sheets 1-4. e, RNA-sequencing analysis of the Tgfb-related gene 
Id2 in TS cells, ES cells, blastoids, trophospheres, blastocysts (left) and in 
trophospheres after stimulation with Nodal, BMP4 or both (see Methods) 
(right). m =2 independent biological samples. The centre depicts the 
mean. Error bars denote the s.d. f, Number of transcripts measured by 
RNA-sequencing analysis for markers of epithelial development Cldn4, 
Krt8 and Krt18. n=2 independent biological samples. The centre 

depicts the mean. Error bars denote the s.d. g, Yield of cystic structures 
for a combination of ES and TS cells as compared to TS cells alone 
(trophospheres) and trophospheres stimulated with BMP4 and Nodal. 
Horizontal bars denote the mean yield. Error bars are s.d. *P=0.02, 
Student’s t-test. n = 3 independent microwell arrays (see Methods). 
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Extended Data Fig. 10 | See next page for caption. 
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Extended Data Fig. 10 | Generating and assaying blastoids with 
KIfo—! ~ TS cells. a, Frequency distribution of the diameter of blastoids, 
trophospheres and trophospheres exposed to BMP4 and Nodal (j1m). 
For all graphics, the red bar is the median as measured for blastoids. 

b, Number of trophoblasts within blastoids, trophospheres and 
trophospheres stimulated with activators of the TGF signalling pathway. 
n= 30 independent blastoids or trophospheres. P= 0.0001, one-way 
ANOVA. The same graph is shown in part in fig. 4b. c, Effect of small- 
molecule inhibitors on blastocyst cavitation. Morula were flushed from 
the oviduct of E2.5 CBA x C57BL/6 mice and cultured in M16 medium. 
After the initiation of morula compaction, inhibitors or DMSO (1:1,000, 
control) were added. LDN193189 (0.25 1M) was used to inhibit ALK2 


and ALK3 (SMAD pathway). Blastocysts were imaged 24h after exposure. 


n= 10 independent blastocysts. P= 0.07, two-sided Student's t-test. Error 
bars denote s.d. d, Targeting strategy of KLF6 in TS cells and PCR gel of 
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the obtained clones. e, Bright-field image of wild-type and KIf6~/~ TS 

cells (top left). Immunostaining for CDX2 in KIf6~'~ TS cells (top right). 
E-cadherin immunostaining (bottom left) and quantitative PCR with 
reverse transcription (qRT-qPCR) (bottom right) of wild-type and KIfo~/— 
TS cells. f, Representative pictures of blastoids and trophospheres, and 
trophospheres stimulated with 45 ng ml~! BMP4 and 5ng ml! Nodal. 
Red asterisks denote blastoids, which comply to our definition of cavitated 
trophoblast structures comprising ES cells, with a circularity greater than 
0.9 (circularity = 47(area/perimeter’)), and a diameter between 70 and 
110,1m (see Methods). Comparable results were obtained in three repeated 
experiments. g, qRT-PCR for Krt8 in structures formed by combining ES 
cells and wild-type or KIf6~/~ TS cells. All structures (blastoids and non- 
blastoids) were collected from the microwell arrays and tested. Horizontal 
bars indicate mean expression. Error bars denote s.d. n =3 pools from 
independent microwell arrays. *P = 0.04, two-sided Student's t-test. 
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The linear ubiquitin chain assembly complex (LUBAC) is required 
for optimal gene activation and prevention of cell death upon 
activation of immune receptors, including TNFRI1'. Deficiency in 
the LUBAC components SHARPIN or HOIP in mice results in severe 
inflammation in adulthood or embryonic lethality, respectively, 
owing to deregulation of TNFR1-mediated cell death?-®. 
In humans, deficiency in the third LUBAC component HOIL-1 
causes autoimmunity and inflammatory disease, similar to HOIP 
deficiency, whereas HOIL-1 deficiency in mice was reported to 
cause no overt phenotype? |!. Here we show, by creating HOIL-1- 
deficient mice, that HOIL-1 is as essential for LUBAC function as 
HOIP, albeit for different reasons: whereas HOIP is the catalytically 
active component of LUBAC, HOIL-1 is required for LUBAC 
assembly, stability and optimal retention in the TNFRI signalling 
complex, thereby preventing aberrant cell death. Both HOIL-1 and 
HOIP prevent embryonic lethality at mid-gestation by interfering 
with aberrant TNFR1-mediated endothelial cell death, which only 
partially depends on RIPK1 kinase activity. Co-deletion of caspase-8 
with RIPK3 or MLKL prevents cell death in Hoil-1~/~ (also known 
as Rbck1~'—) embryos, yet only the combined loss of caspase-8 with 
MLKL results in viable HOIL-1-deficient mice. Notably, triple- 
knockout Ripk3—'~ Casp8—'— Hoil-1—'— embryos die at late gestation 
owing to haematopoietic defects that are rescued by co-deletion of 
RIPK1 but not MLKL. Collectively, these results demonstrate that 
both HOIP and HOIL-1 are essential LUBAC components and 
are required for embryogenesis by preventing aberrant cell death. 
Furthermore, they reveal that when LUBAC and caspase-8 are 
absent, RIPK3 prevents RIPK1 from inducing embryonic lethality 
by causing defects in fetal haematopoiesis. 

To determine the physiological role of HOIL-1, we generated HOIL-1- 
deficient mice by targeting exons 1 and 2 of the Hoil-1 (also known as 
Rbck1) gene (Extended Data Fig. la—d). No mice with homozygous 
deletion in the Hoil-1 gene were weaned (Fig. 1a). Analysis of Hoil-1~'~ 
embryos revealed that they died around embryonic day (E) 10.5 
(Fig. 1a, b). This result was confirmed with a strain generated from 
an independently targeted embryonic stem (ES) cell (C20Hoil-1~'~ 
mice) (Extended Data Fig. le, f). At E10.5, Hoil-1~'— embryos pre- 
sented with disrupted vascular architecture and cell death in the yolk 
sac endothelium (Fig. 1c, d and Extended Data Fig. 1g, h), indicating 
that the absence of HOIL-1 causes aberrant endothelial cell death. 
Hoil- 1" Tie2-cre+ embryos that lack HOIL-1 specifically in endothelial 
and some haematopoietic cells also died around E10.5 with the same 
abnormalities (Fig. le and Extended Data Fig. 1i, j). Loss of TNF or 
TNERI diminished cell death in the yolk sac and prevented lethality at 
E10.5 in Hoil-1~/~ embryos (Fig. 1f and Extended Data Fig. 2a-d). As 
in the Tafr1~/~ Hoip~'~ (also known as Tnfrsfla~'~ Rnf31~') double 


knockouts’, Tnfr1~'" Hoil-1~'~ yolk sacs showed reduced cell death as 
compared to Hoil-1~'~ embryos (Fig. 1f, g). Although cell death was 
not completely ablated in Tnfrl~'~ Hoil-1~'~ embryos, it did not appear 
to significantly affect yolk sac vasculature (Fig. 1f, gand Extended Data 
Fig. 2e). Nevertheless, Tufr1~/~ Hoil-1~'~ embryos died at around E16.5 
(Extended Data Fig. 2d, f), with heart defects before death (Fig. 1h). 
Therefore, like HOIP, HOIL-1 is required to maintain blood vessel 
integrity by preventing TNFR1-mediated endothelial cell death dur- 
ing embryogenesis. 

To understand the role of HOIL-1 in LUBAC function, we compared 
the formation of the TNFR1 signalling complex (TNFR1-SC) in mouse 
embryonic fibroblasts (MEFs) individually deficient for the LUBAC 
components. Although TNFRI-SC-associated linear ubiquitination 
was merely reduced in SHARPIN-deficient MEFs’, it was completely 
absent in Tnf/~Hoil-1~'~ MEFs, exactly as in Tnf-/~ Hoip~/~ MEFs® 
(Fig. 2a). In TNF-stimulated Tnf/~Hoil-1~/~ MEFs, NF-kB activation 
was attenuated (Extended Data Fig. 3a) and TNFR1 complex-II forma- 
tion was enhanced (Fig. 2b), resulting in sensitization to TNF-induced 
apoptosis and necroptosis (Fig. 2c). Hence, HOIL-1 is as essential as 
HOIP for linear ubiquitination within the TNFRI-SC. 

To determine whether the reduction in HOIP and SHARPIN protein 
levels in HOIL-1-deficient cells was responsible for the observed loss 
of linear ubiquitination (Fig. 2a), we reconstituted HOIL-1-deficient 
MEFs with HOIP, with HOIP plus SHARPIN, or, as a control, with 
HOIL-1. Reconstitution with HOIP, either alone or with SHARPIN, 
failed to restore LUBAC recruitment, linear ubiquitination at the 
TNFRI1-SC, or optimal NF-«B activation. Furthermore, the reconsti- 
tution of HOIP and/or SHARPIN was unable to prevent TNF-induced 
complex-II formation and cell death, whereas the re-expression of 
HOIL-1 corrected all aforementioned defects (Fig. 2d-f and Extended 
Data Fig. 3b). In the absence of HOIL-1, HOIP was unable to bind to 
SHARPIN despite both being reconstituted to near endogenous levels 
(Extended Data Fig. 3c). Thus, HOIL-1 is required for LUBAC assem- 
bly and recruitment to the TNFR1-SC, identifying it as an essential 
component of LUBAC alongside HOIP. 

To reveal how HOIL-1 enables LUBAC activity, we generated HOIL-1- 
deficient MEFs stably expressing full-length wild-type HOIL-1, the 
UBL domain of HOIL-1 only (HOIL-1-UBL), HOIL-1-ARBR, HOIL- 
1-AUBL, HOIL-1 with inactivating mutations T201A/R208A in the 
NZF domain (HOIL-1-NZFmut) or HOIL-1 with a point mutation in 
the catalytic cysteine of the RBR domain (HOIL-1-C458A) (Fig. 2g). 
Except for HOIL-1-AUBL, all mutant HOIL-1 proteins bound to HOIP 
and SHARPIN and stabilized their levels (Fig. 2h). Isolation of the 
native TNFR1-SC revealed that HOIL-1-ARBR and HOIL-1-C458A 
fully restored TNF-induced linear ubiquitination in HOIL-1-deficient 
cells, whereas HOIL-1-AUBL did not (Fig. 2i). HOIL-1-deficient cells 
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Fig. 1 | HOIL-1 deficiency causes embryonic lethality at mid-gestation 
due to TNFR1-mediated endothelial cell death. a, Mendelian frequencies 
obtained from inter-crossing Hoil-1*+/~ mice. Asterisk denotes dead 
embryos. b, Representative images of embryos from E9.5 to E11.5 
quantified in a. Asterisks denote poor yolk sac vascularization. Scale bars, 
2mm. c, Top, representative images of yolk sac vascularization (PECAM-1, 
red) and cell death (cleaved (Cl.) CASP3 staining, green) at E10.5 

(n=4 yolk sacs per genotype). Bottom, whole-mount TUNEL staining 
(n=2 yolk sacs per genotype). Scale bars, 50m. d, g, Quantification of 
branching points and cleaved CASP3-positive cells in c and f. Data are 
mean +s.e.m. P values from unpaired two-tailed t-tests are shown. NS, 

not significant. e, Top, representative images of embryos at E10.5 (n= 14 
Hoil-1!"'Tie2-cre+ and n=7 Hoil- 1!" Tie2-cre+ embryos, top). Asterisk 


expressing HOIL-1-UBL or HOIL-1-NZFmut only showed partial res- 
toration of linear ubiquitination, correlating with reduced HOIP and 
SHARPIN levels at the TNFR1-SC (Fig. 2i). Thus, the UBL domain of 
HOIL-1 is essential for linear ubiquitination at the TNFR1-SC, whereas 
a functional NZF domain is required for optimal LUBAC presence in 
the TNFR1-SC. Expression of HOIL-1-ARBR restored optimal NF-KB 
signalling and prevented aberrant TNF-induced cell killing in contrast 
to HOIL-1-AUBL (Fig. 2j and Extended Data Fig. 3d). This observation 
explains why the previously reported mice, regarded as deficient for 
HOIL-1, are viable as they were generated by targeting exons 7 and 8’, 
probably resembling the HOIL-1-ARBR mutant studied here. Because 
the UBL of HOIL-1 binds to HOIP, allowing its activation!’, and the 
NZEF of HOIL-1 binds linear ubiquitin linkages}? our results provide 
evidence that HOIL-1 promotes HOIP activation as well as LUBAC 
assembly and recruitment to the TNFRI-SC via its UBL domain. Once 
linear ubiquitin chains are formed in the complex, the NZF domain of 
HOIL-1 promotes LUBAC retention by binding to these chains. 

Because both HOIL-1 and HOIP are equally important for LUBAC 
function and, consequently, for preventing aberrant cell death in vitro 
and in vivo, we used a genetic strategy to untangle the interaction 
between HOIL-1 or HOIP and the different cell death components. 
Inactivation of RIPK1 in Hoil-1~/~ and Hoip~'~ embryos delayed 
lethality until E14.5 (Fig. 3a and Extended Data Fig. 4a—d). At this 
time, Ripk1*4Hoil-1~/~ and Ripk1<*4Hoip~'~ embryos had dis- 
rupted vascular architecture, excessive cell death in their yolk sacs, 
hearts, livers and lungs, and presented with heart defects and liver 
necrosis (Fig. 3b and Extended Data Fig. 4e—h). In accordance, TNFR1 
complex-II formation and aberrant apoptosis induced by TNF or 
lymphotoxin-« (LT-«) were only partially inhibited in RIP1 kinase- 
dead Ripk1<*“Hoil-1~'~ MEFs (Fig. 3c, d and Extended Data Fig. 4i). 
Thus, although the kinase activity of RIPK1 is essential for excessive 
TNFRI1-induced cell death caused by attenuated LUBAC activity, as 
previously observed in SHARPIN-deficient mice’, this is not the case 
when LUBAC activity is completely abrogated. 


denotes poor yolk sac vascularization. Scale bar, 2mm. Middle, yolk sac 
vascularization (PECAM-1, red) and apoptosis (cleaved CASP3, green). 
Scale bar, 501m. Bottom, yolk sac whole-mount TUNEL staining 

(n=6 Hoil-2"'Tie2-cre+ and n=2 Hoil-U"Tie2-cre* yolk sacs per 
genotype). f, Top, representative images of embryos at E15.5 (n=6 
Tnfr1~'" HOIL-1~~ and n= 19 Tnfr1~' Hoil-1*'~ embryos). Scale bar, 
2mm. Bottom, yolk sac vascularization (PECAM-1, red) and apoptosis 
(cleaved CASP3, green). Scale bar, 50,1m. h, Representative images of 
haematoxylin and eosin (H&E) staining on whole-embryo paraffin 
sections (n = 3 embryos per genotype). Asterisk denote pericardial 
effusion, arrows denote congested vessels. H, heart; L, lung; Li, liver. 
Scale bar, 50 um. 


We next tested whether the loss of RIPK3, MLKL or caspase-8 
could prevent lethality in Hoip~/~ and Hoil-1~/~ embryos. At E10.5, 
Ripk3~'—Hoil-1~'~ embryos presented with defects in vascularization, 
excessive cell death and died at mid-gestation (Extended Data Fig. 5b, c). 
Owing to the close chromosomal linkage of HOIP and RIPK3, we 
generated MIkI-'~Hoip-'~ mice (Extended Data Fig. 5a). These 
embryos also died at mid-gestation (Extended Data Fig. 5d). Likewise, 
neither Casp8 heterozygosity nor full deletion was sufficient to pre- 
vent the mid-gestation lethality of Hoip~'~ and Hoil-1~'~ embryos 
(Extended Data Fig. 5e, f and data not shown). 

As RIPK3-mediated necroptosis may be responsible for the embry- 
onic lethality of Casp8t/~ Hoil-1~'~ or Casp8-'~ Hoil-1~/~ mice'*!°, we 
generated Ripk3~'~ Casp8*' Hoil-1~'~ and Ripk3~'~ Casp8-'" Hoil-1-'~ 
embryos and in both cases the lethality was delayed until around E14.5 
(Fig. 3e and Extended Data Fig. 6a, b). At this developmental stage, 
a single intact copy of caspase-8 was sufficient to induce apopto- 
sis-driven loss of yolk sac vascularization (Fig. 3f and Extended Data 
Fig. 6c, d). Yet, although Ripk3~'~Casp8~'~ Hoil-1~'~ embryos died 
around E14.5, yolk sac vascularization was normalized and cell death 
in the yolk sac and other organs was prevented (Fig. 3f and Extended 
Data Fig. 6c-f). Moreover, Ripk3~'~Casp8~'" Hoil-1-/~ MEFs were 
resistant to cell death induced by TNF or related cytokines (Extended 
Data Fig. 6g). Histological examination and microfocus computed 
tomography scanning revealed the presence of heart defects in 
both Ripk3~'~ Casp8~'~ Hoil-1~'~ and Ripk3~'~ Casp8*!~ Hoil-1~'~ 
embryos (Extended Data Fig. 6h, i). We therefore conclude that 
whereas mid-gestation lethality in Hoil-1~'~ embryos is depend- 
ent on caspase-8/RIPK3-mediated apoptosis and necroptosis, 
Ripk3~'~ Casp8-'~ Hoil-1~'~ embryos die at late gestation by a process 
that is independent of cell death. 

In marked contrast to Ripk3~/~" Casp8~/~ Hoil-1~/~ mice, both 
MIkI-!~ Casp8'~ Hoil-1~'~ and MIkI~'" Casp8-'~ Hoip~'~ mice 
were born, albeit at lower than expected Mendelian ratios (Fig. 3g 
and Extended Data Fig. 7a). These mice were runted and had to be 
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Fig. 2 | The UBL domain but not the RBR domain of HOIL-1 is 
essential for LUBAC activity at the TNFR1-SC and to prevent TNF/ 


TNFRI1-induced cell death. a, d, TNFR1- 
immunoprecipitation (IP) in MEFs derive 
genotypes 4 
(a) and reconstituted with HOIL-1, HOIP 
(n=4 independent experiments) (d). b, e, 


SC pull-down by Flag 
d from mice of the indicated 


t Flag~TNF for 15 min (n = 2 independent experiments) 


or HOIP and SHARPIN 
Immunoprecipitation of the 


adaptor protein FADD in MEFs of the indicated genotypes treated for 


4h with the caspase inhibitor zVAD-fmk 4 


t TNE (b) and reconstituted 


as indicated (e) (n =2 independent experiments (b, e)). c, f, j, Cell death 
analysed by propidium iodide (PI) staining in MEFs with the indicated 


euthanized by 4-5 weeks of age (Fig. 3h, Extended Data Fig. 7b, c). 
Histopathological analysis revealed severe inflammation in the liver 
and lungs (Extended Data Fig. 7d and data not shown). Of note, 
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the indicated inhibitors zVAD-fmk (zVAD) and/or 


necrostatin-1 (Nec-1s) for 24h (c), reconstituted (f) or transduced (j) as 


indicated. Mean +s.e.m 


. (n=3 independent experiments) and P values 


from two-way ANOVA are shown. g, Schematic overview of HOIL-1 


constructs used to trans 


duce Tnf_'~Hoil-1~/~ MEFs. WT, wild type. 


h, Flag immunoprecipitation of indicated HOIL-1 mutants (n = 2 


independent experimen! 


ts). i, Endogenous TNFR1-SC pull-down 


by haemagglutinin (HA) immunoprecipitation in reconstituted 


Tnf '~Hoil-1~'~ MEFs 4 


t HA-TNF for 15 min (n =2 independent 


experiments). EV, empty vector; NT, not treated; TL, total lysate. For gel 


source data (a, b, d, e, h, 


i), see Supplementary Fig. 1. 


Casp8 heterozygosity resulted in increased apoptosis of endothe- 
lial cells, causing lethality in both MIkI~'~Casp8*!~ Hoip~'~ and 


MIkI-'~ Casp8*!~ Hoil-1~'~ embryos around E14.5 (Extended Data 
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Fig. 3 | Concomitant loss of MLKL and caspase-8, but not loss of RIPK1 
kinase activity or combined loss of RIPK3 and caspase-8, promotes 
survival of LUBAC-deficient mice. a, Representative images of E10.5 
(n=6 embryos per genotype), E14.5 (n= 12 Ripk1<*4Hoil-1*'~,n=5 
Ripk1*4Hoil-1~'~ embryos per genotype) and E15.5 embryos (n=3 
embryos per genotype). Scale bars, 2 mm (E10.5) and 5mm (E14.5, 
E15.5). Asterisk denotes poor yolk sac vascularization. b, f, Representative 
images of yolk sac vascularization (PECAM-1, red) and apoptosis 
(cleaved (Cl.) CASP3, green) at E14.5 (b) or E13.5 (f) and quantification. 
Mean +s.e.m. and P values from unpaired two-tailed t-tests (b) or one- 
way ANOVA (f) are shown. Scale bar, 501m. c, Immunoprecipitation 

of the adaptor protein FADD in MEFs treated for 3 h with TNF and 
zVAD-fmk (n =2 independent experiments). For gel source data, see 
Supplementary Fig. 1. d, Cell death measured by propidium iodide (PI) 


Fig. 7e and data not shown) indicating that caspase-8-driven apoptosis 
is sufficient to cause embryonic death of LUBAC-deficient embryos. 

Co-deletion of RIPK3 and caspase-8 causes embryonic lethality 
in otherwise viable SHARPIN-deficient cpdm (chronic prolifera- 
tive dermatitis mice, also known as Sharpin? dm) mice’. However, 
MIkI-'~Casp8~/~ Sharpin?¢™ mice were viable and the inflammatory 
syndrome that characterizes Sharpin?@” mice was prevented (Fig. 3i 
and Extended Data Fig. 7f, g), while expectedly’® developing lymphad- 
enopathy and splenomegaly (Fig. 3iand Extended Data Fig. 7f). Thus, 
the combined loss of any of the three LUBAC components together 
with the loss of caspase-8 uncovers a vital functional difference between 
RIPK3 and MLKL. 

We next evaluated whether the lethality of Ripk3~/ Casp8~-/~ 
Hoil-1~'~ mice is due to aberrant (RIPK3-independent) MLKL 


Stage +/+ +/- -/- Total 


Born 24 32 0 56 


~55 days old 


incorporation in MEFs treated with TNF (10 ng ml’) or LT-a, or not 
treated (NT). Data are mean + s.e.m. (m =3 independent experiments). 
**** P< 0.0001, two-way ANOVA. e, Representative images of E14.5 

(n= 11 Ripk3~'" Casp8~'~ Hoil-1*!~, Ripk3~/~ Casp8*!~ Hoil-1-'~ 

and n=7 Ripk3~'~ Casp8/~ Hoil-1~'~) and E15.5 (n=5 

Ripk3~'~ Casp8~'~Hoil-1*'~, n=4 Ripk3~'~ Casp8*' Hoil-1~/~ and 

n= 8 Ripk3~'~Casp8~' Hoil-1~'~) embryos. Asterisk denotes poor 

yolk sac vascularization. Scale bar, 5mm. g, j, Mendelian frequencies 
obtained from intercrossing MIkI~!~ Casp8*'~ Hoil-1+/~ with 

MIkl-'~ Casp8~'" Hoil-1*'~ mice (g) or MIkI*!" Ripk3~'~ Casp8~'~ Hoil-1*'~ 
with MIkI-'~ Ripk3~!~ Casp8~'~ Hoil-1*'~ mice (j, top) or 

MIkl-'~ Ripk3~'" Casp8'~ Hoil-1*'~ mice (j, bottom). Asterisk denotes 
dead embryo. h, i, Representative images of adult mice quantified in g (h), 
or n=3 mice per genotype in i. m denotes cpdm mutation. 


activation. This was particularly pertinent because MLKL levels 
were increased in Ripk3~'~Casp8~/~Hoil-1~/~ embryos and 
MLKL was aberrantly activated in some of them (Extended Data 
Fig. 7h). However, MLKL co-deficiency did not prevent the death of 
Ripk3~'~Casp8 ~~ Hoil-1~'~ embryos (Fig. 3)). Thus, RIPK3 is required 
for the survival of embryos in the absence of LUBAC by regulating an 
MLKL-independent process. 

To explore the nature of the pro-survival role of RIPK3, we performed 
RNA sequencing (RNA-seq) on E13.5 Ripk3~!~Casp8/~ Hoil-1~'~, 
MIkI-'Casp8-'~ Hoil-1~'~ and control embryos (Extended Data 
Fig. 8a and Supplementary Table 1). Gene Ontology (GO) enrichment 
analysis of differentially expressed genes pointed towards defects in 
erythropoiesis in Ripk3~'" Casp8'~ Hoil-1~'~ embryos (Extended 
Data Fig. 8b). Indeed, reduced levels of erythroid lineage TER119* cells 
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Fig. 4 | Combined deletion of RIPK3 and caspase-8 causes 
haematopoietic defects and RIPK1-dependent embryonic lethality in 
HOIL-1-deficient mice. a, b, Number of TER119* (erythroid) cells (a) 
and enucleated erythrocytes per high-power field (HPF) (b) in E13.5 fetal 
livers with the indicated genotypes. Data are mean + s.e.m. P values from 
unpaired two-tailed t-tests are shown. c, Differentiation of E13.5 fetal liver 
(c-KIT*) progenitors into erythroid burst-forming units (BFU-E). 

Data are mean +s.e.m. P values from unpaired two-tailed t-tests are 
reported. d, Percentage of haematopoietic progenitors negative for 


(Fig. 4a), basophilic erythroblasts (Extended Data Fig. 8c) and mature 
erythrocytes (Fig. 4b) were observed in of Ripk3~/~Casp8'~Hoil-1~'~ 
fetal livers. Furthermore, Ripk3~'~ Casp8/~ Hoil-1~/~ haemato- 
poietic progenitors failed to differentiate into committed erythroid 
burst-forming units (BFU-E) in culture (Fig. 4c). Further analysis 
of the haematopoietic compartment from E13.5 fetal livers revealed 
abnormally reduced percentages and total numbers of multipo- 
tent progenitors (Fig. 4d and Extended Data Fig. 8d, e) as well as 
leucocytes, including granulocytes and macrophages, and myeloid 
progenitors in the Ripk3~'~ Casp8'~ Hoil-1~'~ embryos compared 
to controls, whereas MIkI-'~ Casp8~'~Hoil-1 = embryos had 
normal numbers of these cells (Extended Data Fig. 8f-k). In 
addition, the capacity of haematopoietic progenitors to generate 
colony-forming myeloid progenitors and multipotent progenitors 
was also impaired in the Ripk3~'~Casp8~'~ Hoil-1~-/~ embryos 
(Extended Data Fig. 81). Accordingly, the viability of macrophages 
obtained from Ripk3~!~ Casp8-'“Hoil-1~'~ fetal liver cell suspen- 
sions in culture was significantly lower than those of controls and 
this could not be rescued by inhibiting necroptosis or apoptosis. 
Miki“! ~Casps—! —Hoil-1—'~ fetal liver cells, however, produced normal 
numbers of macrophages (Extended Data Fig. 4m). Despite the 
heart defects of Ripk3~'~ Casp8'~Hoil-1~'~ embryos, blood 
circulation was normal at E13.5 and the percentages of CD45*cKIT* 
cells obtained from aorta-gonad-mesonephros (AGM) regions 
were comparable between Ripk3~'~Casp8-'~ Hoil-1~'~ embryos and 
controls at E11.5 (Extended Data Fig. 80, p). We therefore conclude that 
Ripk3~'~ Casp8~'~ Hoil-1~'~ embryos have defective early haemato- 
poiesis, probably downstream of specification in the AGM, resulting 
in substantial deficiencies in erythroid and myeloid cells. 

Because LUBAC is known to regulate RIPK1 17,18 we investigated 
the role of RIPK1 in the lethality of Ripk3~'~ Casp8~'~Hoil-1-'~ 
embryos. The lethality of Ripk3~'~Casp8~'" Hoil-1~'~ embryos was 
prevented by additional loss of RIPK1, despite RIPK1 levels being rel- 
atively low in Ripk3~!~ Casp8~' Hoil-1~'~ embryos and RIPK1 defi- 
ciency failing to prevent Hoil-1~/~ embryonic lethality (Fig. 4d, e and 
Extended Data Figs. 7h, 9a, b). Importantly, the viability of macro- 
phages obtained from Ripk1~/~ Ripk3~'~ Casp8'~Hoil-1~'~ fetal 
livers was comparable to controls (Extended Data Fig. 9c), indi- 
cating normalized haematopoiesis in these mice. The expression 
of several cytokines, including IL-18, CCL2, IFN-B and CXCL10, 
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mature lineage markers (Lin™) and SCA-1*c-KIT* (LSK) and 
SCA-1~c-KIT* (LK) in E13.5 fetal livers with the indicated genotypes. 
Data are mean +s.e.m. P values from unpaired two-tailed t-tests 

are reported. e, Mendelian frequencies obtained from intercrossing 

Ripk1~'~ Ripk3~/~ Casp8'~ Hoil-1*'~ mice. f, Representative images of 
mice of the indicated genotypes quantified in e. g, Cytokine levels in 
embryo homogenates with the indicated genotypes. Data are mean + s.e.m. 
P values from one-way ANOVA are reported. 


was abnormally increased in Ripk3~'~Casp8~'~ Hoil-1~'~ embryos 
but not in Ripk1~/~ Ripk3~'~ Casp8'~ Hoil-1~'~ embryos (Fig. 4f 
and Extended Data Fig. 9d, e). The function, survival, differenti- 
ation and self-renewal of haematopoietic progenitors are greatly 
impacted by several of these cytokines'®”°. Therefore, our find- 
ings suggest that RIPK1-driven deregulated cytokine production in 
Ripk3~'~ Casp8-'~ Hoil-1~'~ embryos may impair fetal haemato- 
poiesis. Finally, the treatment of pregnant females with the RIPK1 
kinase inhibitor GSK3540547A (GSK’547A)?! did not prevent 
lethality of Ripk3~'~Casp8~'" Hoil-1~'~ embryos, although it was 
able to extend the survival of Ripk3~'~Casp8*' Hoil-1~/~ embryos 
(Extended Data Fig. 9f). These results suggest that the lethality of 
Ripk3~'~ Casp8-'~ Hoil-1~/~ embryos probably depends on the scaf- 
folding function of RIPK1. 

Although RIPK1 is required for emergency haematopoiesis, RIPK1 
might regulate embryonic haematopoiesis differently. Indeed, RIPK1- 
constitutive or RIPK1-haematopoietic-cell-specific-deficient mice 
are not embryonically lethal??*. In addition, the absence of LUBAC, 
RIPK3 and caspase-8 might affect mechanisms during embryogenesis 
that are different from those perturbed by RIPK1 deficiency alone. 
Collectively, our findings indicate that in the combined absence of 
LUBAC and caspase-8, RIPK3 exerts a pro-survival role by regulat- 
ing RIPK1-mediated signalling (Extended Data Fig. 10). Because 
Ripk3~'~Casp8-'~ mice are viable!*!>4, our findings indicate that the 
control of RIPK1 by either LUBAC or RIPK3 is sufficient to enable 
proper haematopoiesis in the developing embryo, probably by prevent- 
ing deregulated cytokine production. Thus, LUBAC and RIPK3 control 
RIPK1-mediated signalling to allow embryonic haematopoiesis. 
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METHODS 


Mice. The Hoil-1-floxed (Hoil- 1") mice were generated by a gene-targeting 
strategy in ES cells in which the targeting cassette was composed of a hygromycin- 
resistance cassette flanked by Frt sites and exons 1 and 2 of the Hoil-1 gene 
flanked by loxP sites. Southern blots of C57BL/6 ES cell clones containing the 
homologous recombination were analysed for the specificity of the recombina- 
tion and the absence of any unwanted integration. Two ES cell clones were used 
to generate mutant animals on the C57BL/6 genetic background, correspond- 
ing to the two independent Hoil-1~'~ strains (Hoil-1~/~ and C20Hoil-17/-). 
The hygromycin cassette was removed by crossing these mice with C57BL/6 
mice expressing the FIpE recombinase and this was followed by a cross with 
C57BL/6 mice to remove the flpe transgene. Hoip~/~ and Hoil-1~/~ mice were 
generated by crossing Hoip"' mice, previously described'’, and Hoil-1" mice 
(described here) with transgenic mice expressing the loxP-deleter Cre recom- 
binase (purchased from JAX: 6054, B6.C-Tg(CMV-Cre)1 Cgn/J). Transgenic 
mice expressing the Cre recombinase under the control of the Tie2 (also known 
as Tek) promoter (Tie2-Cre) (B6.Cg-Tg(Tek-cre) 1 Ywa/J)> were used to delete 
floxed genes specifically in endothelial cells. CS7BL/6 MIkI-/~ mice crossed to 
Sharpin?4" mice were previously described”®. For all other crosses MIkI~/~ mice 
were generated using transcription activator-like effector nuclease (TALEN). In 
brief, TALENs targeting exon 1 of the MIkl gene were cloned via Golden-gate 
assembly. The RVD sequence of TAL1 against TACCGTTTCAGATGTCA was 
NIHDHDNNNGNGNGHDNINNNINGNNNGHDNI, and TAL2 against 
TCGATCTTCCTGCTGCC was HDNNNINGHDNGNGHDHDNGNNHDNGN 
NHDHD. Capped RNA was produced in vitro using mMESSAGE mMACHINE 
T7 Transcription Kit (Ambion) and poly A tail was added using Poly(A) Tailing 
Kit (Ambion). Purified transcripts were mixed and adjusted to 25 ng’. C57BL/6 
fertilized eggs were injected into both the cytoplasm and the pro-nucleus. Embryos 
were transferred into C57BL/6 pseudo-pregnant females. Pups were genotyped 
by sequencing using genomic DNA obtained from ear punches. One female car- 
rying a 19-base pair (bp) homozygous deletion causing a premature stop codon 
was selected for further breeding. MIkI-/~ mice were backcrossed to C57BL/6 
mice for two generations. Sharpin?4” (C57BL/Ka) and Tnfrl ~'~ (2818, B6.129- 
Tnfrsflatm1Mak/J) mice were purchased from JAX. Tnf /~ mice (C57BL/6;129S6) 
were provided by W. Kaiser. Ripk3~!~?’, Casp8~'~*8, Ripk 1°44 and Ripk1~/~ 
mice have been reported previously. Timed matings were performed as previously 
described*. All mice were genotyped by PCR, fed ad libitum. All animal exper- 
iments were conducted under an appropriate UK project license in accordance 
with the regulations of UK home office for animal welfare according to ASPA 
(animal (scientific procedure) Act 1986). The relevant Animal Ethics Committee 
approved all experiments involving Sharpin?“ and the Ripk1~'~ crosses which 
were maintained under appropriate licenses and subject to ethical review at The 
Walter and Eliza Hall Institute (Melbourne, Australia) and UT Health Sciences 
Center San Antonio (TX, USA), respectively. 

Histological analysis, TUNEL and immunofluorescence staining. Embryos or 
organs from adult mice were collected and fixed in 10% buffered formalin and 
paraffin embedded. Sections of 4-\,m were stained with haematoxylin and eosin 
following standard procedures. Necropsy of adult mice or six sagittal serial sec- 
tions of two different planes of the embryo were used for blinded pathological 
analysis. For TUNEL staining, sections were treated according to the manufac- 
turer’s instructions (DeadEnd Fluorometric TUNEL System, Promega, G3250). 
For whole-mount TUNEL staining and immunofluorescence staining, samples 
were processed using the ApoTag plus Peroxidase In situ Apoptosis Detection 
Kit (Millipore, $7101) according to the manufacturer’s instructions and as pre- 
viously described®. Quantification was performed by an experimenter blinded to 
the genotype of the mice by using ImageJ Software on monochrome images of the 
whole yolk sac by measuring the area of positive staining. Alternatively, TUNEL- 
positive cells were counted on five different fields (10x magnification). Yolk sacs 
were stained with antibodies against PECAM-1 (BD Biosciences, 5533370 clone 
MEC13.3) and cleaved caspase 3 (Cell Signaling, 9664), followed by staining with 
secondary antibodies, Alexa Fluor 594 goat anti-rat IgG and Alexa Fluor 488 goat 
anti-rabbit IgG (Invitrogen, A-11007 and A-11034, respectively), and analysed by 
fluorescent microscopy. Quantification was performed by an experimenter blinded 
to the genotype of the mice on ten different fields (10x magnification) per yolk sac. 
Microfocus computed tomography scan. Embryos were fixed in 4% paraform- 
aldehyde and potassium triiodide (Lugol's iodine/I2KI, to impart tissue contrast), 
with a total iodine content of 63.25 mg ml“! (iodine mass of 2.49 x 10-* mol ml“), 
in a 1:1 ratio for 8h before imaging. Before scanning, the embryos were washed, 
wrapped in Parafilm M (Bemis) and secured in 3% (w/v) Agar (Sigma-Aldrich), 
within a low-density plastic cylinder to ensure mechanical stability during scan 
acquisition. Images were acquired using an XT H 225 ST microfocus-computed 
tomography scanner with a multimetal target (Nikon Metrology). Scans were recon- 
structed using modified Feldkamp filtered back projection algorithms with propri- 
etary software (CTPro3D; Nikon Metrology) and post-processed using VG Studio 


MAX (Volume Graphics GmbH). Soft tissues were analysed by Phong shading 
of direct volume renderings and plain projections and the vascular system by 
maximum intensity projections. 

Cells. MEFs were isolated from E12.5-E13.5 embryos in accordance with standard 
procedures and these cells were maintained in DMEM medium supplemented 
with 10% fetal bovine serum (Sigma). Transformation was performed by lentiviral 
infection with the SV40 large T antigen. For reconstitution experiments, the coding 
sequence of mouse HOIP, SHARPIN or HOIL-1 wild-type (WT), the UBL domain 
of HOIL-1 only (HOIL-1-UBL; amino acids 1-139), HOIL-1-ARBR (amino acids 
1-252), HOIL-1-AUBL (amino acids 140-508), HOIL-1 with inactivating muta- 
tions T201A/R208A in the NZF domain (HOIL-1-NZFmut) or HOIL-1 witha 
point mutation in the catalytic cysteine of the RBR domain (HOIL-1-C458A) was 
inserted in MSCV vector followed by the internal ribosome entry site (IRES)-GFP 
sequence. These vectors were retrovirally transduced into MEFs and GFP-positive 
cells were sorted in a MoFlo cytometer (Beckman Coulter). 
Immunoprecipitation. For isolation of the TNFR1-SC, transformed MEFs were 
stimulated with 3 x Flag-2 x Strep-TNF at 0.5 1g ml! for 15 min, and controls 
were left untreated. Cells were subsequently solubilised in lysis buffer (30 mM 
Tris-HCl (pH 7.4), 150mM NaCl, 2mM EDTA, 2mM KCl, 10% glycerol, 1% 
Triton X-100, EDTA-free proteinase inhibitor cocktail (Roche, 5056489001) and 
1x phosphatase-inhibitor cocktail 2 (Sigma, P5726-1ML)) at 4°C for 30 min. The 
lysates were cleared by centrifugation, and 3x Flag-2 x Strep-TNF (0.5 1g ml“! 
per sample) was added to the untreated samples. Subsequently, the lysates were 
subjected to anti-Flag immunoprecipitation using M2 antibody coupled sepha- 
rose beads (Sigma, A2220-5ML) for 16h. For immunoprecipitation of FADD, 
transformed MEFs were treated with 201.M zVAD-fmk (Abcam, ab120487) in 
the presence or absence of 100ng ml! 6x His-TNF for 3h. Cells were lysed as 
described above and FADD was immunoprecipitated using anti-FADD antibody 
(Santa Cruz, sc-5559) and protein G Sepharose Beads (GE healthcare, 17-0618- 
01) at 4°C for 4h. For SHARPIN immunoprecipitation, anti-SHARPIN antibody 
(ProteinTech, 14626-1-AP) was used. For all immunoprecipitations, the beads 
were washed three times with lysis buffer. Proteins were eluted in 50 11 of LDS 
buffer (NUPAGE, Invitrogen) containing 50 mM dithiothreitol (DTT). Samples 
were analysed by western blotting. 

Western blot analysis and antibodies. Whole embryos were snap-frozen and 
homogenized in RIPA buffer (50mM Tris pH 8.0, 150mM NaCl, 0.5% sodium 
deoxycholate, 1% NP-40 and 1x EDTA-free proteinase inhibitor cocktail (Roche, 
5056489001) or RIPA buffer with 6 M urea for the experiment in Extended data 
Fig. 7h. Alternatively, cells were washed twice with ice-cold PBS before lysis in 
lysis buffer. Protein concentration of lysates was determined using BCA protein 
assay (Thermo Scientific). Lysates were subsequently denatured in reducing sam- 
ple buffer at 95°C for 10 min before separation by SDS-PAGE (NuPAGE) and 
subsequent analysis by western blotting using antibodies against HOIL-1°°, HOIP 
(custom-made, Thermo Fisher Scientific), SHARPIN (ProteinTech, 14626-1-AP), 
TNER1 (Abcam, ab19139), actin (Sigma, A1978), pIkBa (Cell Signaling, 9246), 
IkBa (Cell Signaling, 9242), phospho p65 (Cell Signaling, 3033) cleaved caspase-8 
(Cell Signaling, 9429), linear ubiquitin (Merck Millipore, MABS199), RIPK1 (BD, 
610459), RIPK3 (Enzo, ADI-905-242-100), FADD (Assay Design, AAM-121), 
MLKL (Millipore, MABC604), phospho-MLKL (Abcam, ab196436) and tubulin 
(Sigma, T9026). 

Cell death analysis by propidium iodide staining. Cells were seeded to 80% 
confluence and were then incubated with 100 ng ml“! His-tagged TNF, 1 pg ml“! 
CD95L-Fc, 11g ml’ isoleucine zipper tagged murine TRAIL (iz-mTRAIL), 
100g ml“! poly(I:C) HMW (InvivoGen, tlrl-pic), 20ng ml! IFN-7 (Peprotech, 
315-05) or 100ng ml! LT-« (Thermo Fisher Scientific, 10270-HNAE) for 24h, 
unless otherwise indicated. When indicated the following inhibitors were used: 
20M zVAD-FMK (Abcam, ab120487), 101M necrostatin-1 (Biovision, 2263-5). 
Supernatants and adherent cells were collected and resuspended in PBS containing 
5g ml“! propidium iodide. Propidium iodide-positive cells were enumerated by 
FACS (BD Accuri). 

RNA-sequencing analysis. E13.5 embryos were snap frozen and RNA was pre- 
pared using the RNeasy minikit (Qiagen, 74104) according to the manufacturer’s 
instruction. To generate the library, samples were processed using the KAPA 
mRNA HyperPrep Kit (KK8580) according to the manufacturer's instructions. 
In brief, mRNA was isolated from total RNA using Oligo dT beads to pull down 
poly-adenylated transcripts. The purified mRNA was fragmented using chemical 
fragmentation (heat and divalent metal cation) and primed with random hexamers. 
Strand-specific first strand cDNA was generated using reverse transcriptase in the 
presence of actinomycin D. The second cDNA strand was synthesized using dUTP 
in place of dTTP, to mark the second strand. The resultant cDNA was then ‘A-tailed’ 
at the 3’ end to prevent self-ligation and adaptor dimerization. Truncated adaptors, 
containing a T overhang were ligated to the A-tailed cDNA. Successfully ligated 
cDNA molecules were then enriched with limited cycle PCR. Libraries to be multi- 
plexed in the same run were pooled in equimolar quantities, calculated from Qubit 
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and Bioanalyser fragment analysis. Samples were sequenced on the NextSeq 500 
instrument (Illumina) using a 43-bp paired-end run. Run data were de-multiplexed 
and converted to fastq files using the Illumina bcl2fastq Conversion Software v2.18 
on BaseSpace. Fastq files were then aligned to a reference genome using STAR on 
the BaseSpace RNA-Seq alignment app v1.1.0. Reads per transcript were counted 
using HTSeq and differential expression was estimated using the BioConductor 
package DESeq2 (BaseSpace app v1.0.0). Next, four groups of differentially regu- 
lated genes were analysed: low and high abundance Ripk3~'~ Casp8~!~ Hoil-1*!~ 
versus MIkI-'~Casp8~'~ Hoil-1*/~ embryos and low and high abundance in 
Ripk3~'~ Casp8~'~ Hoil-1~'~ versus MIkI-'~ Casp8~'~ Hoil-1~'~ embryos. To 
identify genes that were specifically altered in the absence of HOIL-1, the 
Venny 2.1 software was used to exclude genes that were differentially expressed 
between Ripk3~!~ Casp8~!~ Hoil-1*!~ and MIkI-'~ Casp8~/~ Hoil-1*'~ embryos 
from those between Ripk3~/~ Casp8~'~ Hoil-1~'~ and MIkI-'~ Casp8~'~ Hoil-1-/~ 
embryos. Genes that were already differentially expressed between the 
corresponding HOIL-1-expressing controls (that is, Ripk3~/~ Casp8~/~ Hoil-1+/- 
and MIkI~'~Casp8~'~Hoil-1*'~ embryos) were excluded from the dif- 
ferentially expressed genes between Ripk3~'~Casp8'~Hoil-1~/~ and 
MIkI-'~Casp8~'~ Hoil-1~'~ embryos. The resulting list of genes (33/85) was entered 
in the STRING software (https://string-db.org) to assess for functional enrichment 
in biological networks. Gene Ontology (GO) terms with false discovery rate below 
1% are shown. 

Flow cytometry analysis, colony-forming unit assay and macrophage culture. 
For phenotypic analysis, single-cell suspensions from mechanically dissociated 
E13.5 fetal livers or a pool of aortas (AGM region) from three embryos were stained 
for 30 min on ice with various antibody cocktails. The antibodies against the surface 
markers examined were: CD 16/32, clone 93 and 2.4G2 (eBioscience, 45-0161-82 
and BD553141), CD135, clone A2F10.1 (BD, 553842), Ly-6A/E, clone D7 (Sca-1) 
(BD, 558162), CD117 (c-Kit), clone 2B8 (BD, 560185), CD34, clone RAM34 
(BD, 562608), mouse lineage cocktail, clones 17A2/RB6-8C5/RA3-6B2/Ter-119/ 
M1/70 (Biolegend, 133313 and BD, 561301), CD16/32, clone 2.4G2 (BioXcell, 
CUS-HB-197), CD11b, clone M1/70 (Biolegend, 101228 and eBioscience, 15-0112- 
81), CD11c, clone HL3 (BD, 561241), F4/80, clone BMB8 (Biolegend, 123110), GR-1, 
clone RB6-8C5 (Biolegend, 108416 and 108410), CD45, clone 30-F11 (Biolegend, 
103128 and Biolegend, 103112), CD3e, clone, 145-2C11 (Biolegend, 100310), B220, 
clone RA3-6B2, (Biolegend, 103210), CD71, clone RI7217 (Biolegend, 113807), 
TER-119, clone TER-119 (Biolegend, 116234) and fixable viability dye (eBio- 
science, 65-0864-18 and 65-0867-14). The myeloid progenitors were identified 
in the LK population as CD34*CD16/32~ (CMP), CD34*CD16/32* (GMP); 
CD34~CD16/32~ (MEP). Fluorescence minus one (FMO) was used as a gating 
control. For quantification of absolute number of cells, a defined number of flow 
cytometric reference beads (Invitrogen) were mixed with the samples before 
acquisition. Samples were processed either using LSR Fortessa (BD Biosciences) 
or sorted in a FACSAria FUSION cell sorter (BD Biosciences). Data were analysed 
with FlowJo 7.6.1 software (Treestar). Cytospin preparations of 10,000 cells per 
slide of E13.5 fetal liver homogenates were stained by May-Griinwald Giemsa 
staining and enucleated erythrocytes were quantified blindly as number of cells 
per HPF using Image] Software. For growth of primitive erythroid progenitor 
cells or all haematopoietic stem cells, 5,000 sorted Lin“ c-KIT* E13.5 liver cells 
were cultured in MethoCult SF containing cytokines, including EPO (Stem Cell, 
M3436) or Mouse Methylcellulose Complete Media (R&D, HSC007), respectively. 
Colonies were enumerated after 14 days of incubation. For preparation of fetal 
liver-derived macrophages, equal amounts of E13.5 single cell suspensions were 
cultured and differentiated for 5 days in DMEM supplemented with 10% FCS plus 
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20% L929-conditioned medium (as a source of M-CSF) supplemented or not with 
the indicated inhibitors. Cells were imaged using EVOS Auto cell imaging system 
and viability was measured using the CellTiter-Glo Luminescent Cell Viability 
Assay (Promega, G7572). Alternatively, cells were stained with Hoechst dye and 
enumerated using Citation cell imaging platform. 

Cytokine analysis. Embryo homogenates prepared as described above (‘Western 
blot and antibodies’ section) were analysed with Proteome Profiler Arrays (Mouse 
Angiogenesis Array, ARY015, and Mouse Cytokine array Panel A, ARY006 both 
R&D). ELISA kits used were the CXCL4 (R&D, DY595), CXCL11 (Abcam, 
ab204519), CXCL10 (R&D, DY466-05), IFN lambda 2/3 (Pbl assay science, 62830- 
1), IL-18 (ThermoFisher, BMS6002) and IFN-8 ELISA (ThermoFisher, 424001). 
Epidermal thickness quantification. Per mouse, 1-2 pieces of skin were taken 
and epidermal thickness was measured by microscopy using a 20x magnification. 
Quantification was performed by an experimenter blinded to the genotype of the 
mice by using the CellSens software with at least 20 measurements per mouse. 
Pharmacological inhibition of RIPK1 kinase activity. Mice were fed with 
rodent chow containing 100 mg kg" of the RIPK1 kinase inhibitor GSK3540547A 
(GSK’547A) (GlaxoSmithKline LLC) starting a week before mating and kept on 
this diet throughout pregnancy until caesarean section at the indicated time points. 
Statistics and reproducibility. Group size was determined based on preliminary 
datasets. Statistical significance was determined using unpaired, two-tailed par- 
ametric Student’s t-test. One- or two-way ANOVA with Tukey’s multiple com- 
parisons test was applied. 95% Confidence interval was considered for statistics 
and P< 0.05 was considered significant. *P < 0.05, **P < 0.01, ***P < 0.001, 
P< 0.0001. Multiplicity-adjusted P values are reported for multiple compar- 
isons. All statistical analyses were performed using Graphpad Prism 6. Statisical 
transformations for RNA-seq were performed with DESeq2 and adjusted P values 
used the Benjamini—-Hochberg test. All in vitro experiments were performed at 
least twice with similar results. Unless indicated in figure legends in vivo exper- 
iments were performed with at least two embryos per genotype. At least three 
embryos were considered for statistical testing. The experiments were not ran- 
domized. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. RNA-sequencing analysis data are available from the Sequence 
Read Archive (SRA) database SRP134865 (BioProject accession PRJNA437851) 
and comparative datasets including genes differentially regulated genes between 
embryo homogenates with different mutations are displayed in Supplementary 
Table 1. 
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Extended Data Fig. 1 | HOIL-1-deficient mice die at mid-gestation. 

a, Schematic representation of the Hoil-1-knockout strategy. Solid boxes 
represent Hoil-1 exons and grey boxes with a star indicate the targeted 
exons. Boxes with diagonal and horizontal strips represent loxP and 

Frt sites, respectively. b, Specificity of gene recombination was assessed 
by Southern blotting with 5’ and 3’ probes external to the construct in 
four clones (14B8, 14F6, 20D7 and 21F7). Digest of the DNA with Apal, 
followed by hybridization with the 3’ probe was expected to show a 
5,700-bp band for the wild-type allele and a 7,700-bp band for the mutant 
allele. All four clones appeared to have the correct recombination on 
the 3’ side. Digest of the DNA with SphI and hybridization with the 5’ 
probe was expected to show a 4,500-bp wild-type band and a 6,200-bp 
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correctly recombined on the 5’ side. Finally, cutting the DNA with Apal 
and hybridizing with a hygromycin probe showed a single band in all 
clones, indicative of a single integration of the construct in all four ES 
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C20Hoil-1*'~ mice. Asterisk indicates dead embryo. f, Representative 
images of C20Hoil-1*/~ and C20Hoil-1~/~ embryos from E9.5 to 

E11.5 as quantified in e. Scale bars, 2mm. g, Single staining showing 
vascularization (PECAM-1, top) and apoptosis (cleaved CASP3, bottom) 
of yolk sacs. Merged image is shown in Fig. 1c. h, Whole-mount TUNEL 
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independent experiments). c, SHARPIN immunoprecipitation was 
performed in Tnf'~Hoil-1~'~ MEFs reconstituted with HOIL-1 or a 
combination of HOIP and SHARPIN and analysed by western blotting 
(n=2 independent experiments). For gel source data, see Supplementary 
Fig. 1. 
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images of cell death in different organs (f) and quantification (g) as 
detected by TUNEL staining at E14.5 (n= 3 embryos per genotype). Scale 
bar, 501m (f). Mean + s.e.m. (1 =3 embryos per genotype). P values from 
one-way ANOVA are reported. h, Representative images of H&E staining 
on whole embryo paraffin sections (n = 3 embryos per genotype). Asterisk 


Extended Data Fig. 4 | Ablation of the kinase activity of RIPK1 in 
HOIL-1- or HOIP-deficient embryos prevents cell death and lethality 
at mid-gestation but not at late gestation. a, b, Quantification of 
genotypes of animals obtained after intercrossing Ripk1<*4Hoil-1*'— 
(a) and Ripk1<*4Hoip*'~ (b) mice. Asterisk indicates dead embryo. 


c, Representative images of embryos quantified in b. Asterisks denote 
poor yolk sac vascularization. Scale bar, 2mm. d, Whole-mount TUNEL 
staining of embryos (n = 2 embryos). Scale bar, 2mm. e, Single staining 
showing vascularization (PECAM-1, top) and apoptosis (cleaved CASP3, 


bottom) of yolk sacs. Merged image is shown in Fig. 3b. f, g, Representative 


denotes pericardial effusion. n, necrotic area. Scale bar, 200 um. i, Cell 
death was analysed by propidium iodide (PI) staining in MEFs stimulated 
with TNF for 24h plus the indicated cell death inhibitors. Mean + s.e.m. 
(n=3 independent experiments). P values from two-way ANOVA are 
reported. 
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Extended Data Fig. 6 | Combined deletion of RIPK3 and caspase-8 
prevents cell death but not embryonic lethality at late gestation that 
is caused by the loss of HOIL-1. a, Quantification of genotypes of 
animals obtained from inter-crosses of Ripk3~'~ Casp8*'~ Hoil-1*/~ with 
Ripk3~'~Casp8~' Hoil-1*!~ mice (left) or Ripk3~/~ Casp8~'~ Hoil-1*/— 
mice (right). b, Health status of Ripk3~/~ Casp8*' Hoil-1~/~ and 
Ripk3~'~ Casp8~'~ Hoil-1~/~ embryos at different developmental 
stages. c, Single staining showing vascularization (PECAM-1, top) 

and apoptosis (cleaved CASP3, bottom) of yolk sacs. Merged image is 
shown in Fig. 3f. Scale bar, 501m. d, Cell death as detected by whole- 
mount TUNEL staining in yolk sacs at E14.5 (left) and respective 
quantification (right). Mean + s.e.m. (n= 3 embryos per genotype). 

P values from one-way ANOVA are reported. e, f, Representative 
images (e) and quantification (f) of cell death in different organs 

as detected by TUNEL staining at E13.5 (n = 3 embryos per 


genotype) and E14.5 (n=5 for Ripk3~/~ Casp8/~ Hoil-1*'~, n=2 for 
Ripk3~'~ Casp8~' Hoil-1~'~ and Ripk3~'~ Casp8~'~ Hoil-1~'~ lung and 
liver and n=3 Ripk3~'" Casp8'~ Hoil-1~'~ heart). Scale bars, 50j1m. Data 
are mean + s.e.m. g, Cell death was analysed by propidium iodide (PI) 
staining in MEFs stimulated or not with the indicated ligands for 24h. 
Data are mean +s.e.m. (n= 3 independent experiments). P values 

from two-way ANOVA are reported. h, Representative images of H&E 
staining on E13.5 whole embryo paraffin embedded sections (n = 3 for 
Ripk3~'~ Casp8~'~ Hoil-1*'~ and Ripk3~'~ Casp8"/~ Hoil-1~'~ and n=2 
for Ripk3~/~ Casp8*'~ Hoil-1~'). Asterisks denote pericardial effusion. 
Arrows denote congested vessels. Scale bar, 200 1m. i, Representative 
images of microfocus computed tomography scan images of whole E13.5 
embryos (n= 3 embryos per genotype). Asterisks denote pericardial 
effusion. 
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Extended Data Fig. 7 | Combined deletion of MLKL and caspase-8 
promotes survival of LUBAC-deficient mice. 

a, Quantification of genotypes of animals obtained from intercrosses 
of MIkI-'~ Casp8*'" Hoip*'~ with MIkI-'~ Casp8~'" Hoip*'~ mice. 
Asterisk denotes dead embryo. b, Representative images of adult mice 
as quantified in a. c, Kaplan-Meier plot of mouse survival (n= 6 for 
MIkI-'~ Casp8~'~ Hoip~'~ and n=9 for MIkI-'" Casp8~'~ Hoil-1~'~ 
mice). d, Representative images of H&E staining of the indicated organs 
(n =3 mice per genotype). Scale bars, 200 1m. e, Representative images 
of yolk sac vascularization (PECAM-1, red) and apoptosis (cleaved 
CASP3, green) (top) at E13.5 and respective quantifications (bottom). 
Data are mean +s.e.m. (n=5 for MIkI-'~ Casp8~'~ Hoil-1*'~ and 


MIkI~'~ Casp8~'~ Hoil-1~'~ and n=2 for MIkI-'~" Casp8*!~ Hoil-1~'-). 
Statistical significance was determined with unpaired two-tailed t-tests 
comparing MIkI-'~ Casp8~/~ Hoil-1*'~ and MIkI-'~ Casp8~!~ Hoil-1~'~ 
embryos. f, Representative H&E staining images of the indicated 
organs (nm = 3 embryos per genotype). Scale bars, 200 1m. g, Epidermal 
thickness quantification of mice of the indicated genotypes in f. Data 
are mean + s.e.m. (n =3 mice per genotype). Statistical significance was 
determined with unpaired two-tailed t-tests. h, Western blot analysis of 
lysates from whole E13.5 embryos of the indicated genotypes and L929 
cells treated with/without TNF plus zVAD-fmk (TZ) for 2 h as antibody 
validation (n = 4 embryos per genotype performed twice). For gel source 
data, see Supplementary Fig. 1. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a b #pathway ID pathway description gene count FDR 
Low in Ripk3Caspase-8“Hoil-1” vs Miki’ Caspase-8“ Hoil-1~ 
ee Penida /. iy Guiee GO.0030218 erythrocyte differentiation 8 4.30E-06 
Ripk3“ Caspase-8~“Hoil-1*” vs Mik” Caspase-8” Hoil-1 G0.0034101 erythrocyte homeostasis 8 5.42E-06 
1 Ripk3”-Caspase-8“-Hoil-1”~ vs Mikt“Caspase-8“ Hoil-17 GO.0006779 porphyrin-containing compound biosynthetic process 5 9.94E-05 
GO.0048821 erythrocyte development 5 0.00012 
GO.0006783 heme biosynthetic process 4 0.000958 
HIGH { GO.0030099 myeloid cell differentiation e 0.00702 
GO.0046501 protoporphyrinogen IX metabolic process 3 0.00702 
LOW | GO.0065008 regulation of biological quality 23 0.00702 
GO.0044699 single-organism process 53 0.00794 
High in Ripk3“Caspase-8~-Hoil-17” vs Mikt“ Caspase-8- Hoil-17” 
GO.0007155 cell adhesion 10 0.000809 
c O Ripk3“Caspase-8Hoil-1*” — @ MIkI/“Caspase-8’Hoil-1*” d Ripk3“Caspase-8 Mikt/Caspase-8” 
= A Ripk3“Caspase-87/Hoil-1/ YW Miki’/Caspase-8’Hoil-17” Hoil-1*~ Hoil-1* Hoil-1*~ Hoil-1~ 
? & ext0° P=0.019 6x10° wie . 
“ = ° = 0.02 
2 aro? 4x10° : FS Fil aa 
0 = 4x 
¢ s $i o 
ie 2 oy wi. 27 
= ° 5 2x10°. r 2x10°. o" LK LK 
1 a 7 = fox) OT a a ae Fae ier aaa a 
Tert19-BV421 ——> 8 0. 0. B cKit-APC-Cy7 a A A i me i 
R1 R2 R3 R4 RS Ri R2 R3 R4 RS y 
e LSK LK f CD45* cells Gr-1* cells F4-80* cells 
. 3x10 P=0.003 3x105 P< 1x10 gy 49s MS agg, P=2X10* 419. ns 
oO a rs 
2 % 4 8x10° 
E 2x105 2x105 1.5x10 1x10 0° 
e ls 3 6x10 ’ 
E a i$ ele * a 4x10° T 
8 1x10° 1x10° 5x10] © * 
ws 5.0x10° Fr = 3 = 
g oO ° ras = ois 
e 0 0 a, 0 mm 0 1 2 
n=11 n=4 n=10 n=5 n=11 n=4 n=10 n=5 n=12 n= n=12 n=4 n=9 n=5 n=12 n=4 n=9 n=4 
+ + ps 
‘ tego oo os O Ripk3“Caspase-8’-Hoil-1*- A Mikt/Caspase-87Hoil-1*”- 
8 oa ; m Ripk3/Caspase-8’Hoil-1/” —-y Mikt“"Caspase-87Hoil-1” 
Gy] 7 
° gz ; g CD45‘ cells Gr-1* cells F4-80* cells 
é F : 2204 P=0.003 29 ns 44 P=0.009 4 ns 154 P=0.006 3 _ns 
16 10 10 of 0" ray 
oO 
& 5 D (0) 
e 2 
By nm 3 
ex “oa ° xo 
og m a 
oo : : e 
x oO 
= | - serail nee J |" - 
; ° w ° i CMP GMP 
ec " ee waeio* ‘i 
Bel | ge | oe ] — 0 4.54 P=8x10%4 5_ P= 0.003 
of 5 Oo ° 
=| . = . : : : 9 1.0 fe 1.04 50 
‘ 0 F) . B 
& |. a" om 05 0.5 Ra 
o o oY Ou 
es a" i : oO” Re} 
a2 , s Be x 0 Aix, 
é2 Bie b & & n=11 n=4 n=11 n=4 
S a" aw 
= Aswaxt> ze Ee 
CD45 (AF700) ———» F4-80 (PE) ————> Gr (PE-Cy7) ——» 
k I m 
& wf 
$ j Hoil-1*/ Hoil-17~ 
a ot i 4 150-P = 0.009 P=0.016 ns 
B32 | we 015 4 ® = = = ORipk3“Caspase-8’-Hoil-1*/- 
&s 2 + P=0.017 P=0.018 pad Hg angas 
2 53 Q 3 > 4100. + w Ripk3“Caspase-87 Hoil-1 
= of a Be 40 és = & & ge FE a MiktCaspase-8/Hoil-1*” 
& a en 3% & 8 “a as yw MIkt“Caspase-8/-Hoil-1” 
> . 
& a a8 q e 50 
3 3 #35} 22 _% EL es 
gy o* Se + s 8 . 
as 8 26 | + = 3 fm 
$3 &" = 0 = ZNAD = © t+ * = 
¢ a 
8 ei fo ote 8 Nec-1s ek Je ee 
Ree 13 ar 


CD34 (vazt) , AGM 


© Ripk3“Caspase-87Hoil-1*/- 


eh “plaid aed! 0.5 Os ait piel pod 
° Ripk3“Caspase-87Hoil-1 p Ripk3“Caspase-8” Ripk3”“Caspase-8” mw Ripk3Caspase-8’Hoil-1” 
Hoil-1* Hoil-1** 204 
vw 0.285 0.286 8 oe 
| eas P SO 
o eerce o € 
~ 0.2 
2 0 - |. £ 
3 rs a 
5° : Xo 
* rie tie ae 
CD45 (AF700) 0.0 


Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | Combined deletion of RIPK3 and caspase-8 
causes haematopoietic defects and RIPK1-dependent embryonic 
lethality in HOIL-1-deficient mice. a, Venn diagram depicting genes 
differentially expressed by RNA-seq analysis between E13.5 embryos of 
the indicated genotypes. b, Gene Ontology (GO) enrichment analysis 
of differentially expressed genes (85 low and 35 high in a). FDR, false 
discovery rate. c, Representative FACS profile of E13.5 fetal liver 

cells with different erythroblast populations gated according to their 
CD71 and TER119 expression levels (R1-R5) and quantification. R1 
contains immature red blood cell progenitors, including primitive 

and later-stage erythroid progenitor cells (erythroid burst-forming 

unit (BFU-E) and colony-forming unit (CFU-E), respectively); R2 
comprises mainly pro-erythroblasts and early basophilic erythroblasts; 
R3 contains both early and late basophilic erythroblasts; R4 is composed 
of chromatophilic and orthochromatophilic erythroblasts; and R5 
consists of late orthochromatophilic erythroblasts and reticulocytes. 
Data are mean +s.e.m. (n = 14 Ripk3~'~Casp8~'~Hoil-1+'~, n=8 
Ripk3~'~Casp8~'" Hoil-1~'~, n=5 for MIkI~'~ Casp8~'" Hoil-1~/~ and 
n=3 for MIkI-'~ Casp8~'~ Hoil-1~'~ fetal livers). P values from two- 
way ANOVA are reported. d, h, k, Representative FACS profile of E13.5 
fetal liver cells for the indicated haematopoietic populations (sample 
size specified in e-g, i, j). e, f, j, Total cell number of the different 
haematopoietic cell subsets in fetal liver cell suspensions from E13.5 
embryos of the indicated genotypes gated as in d, h and k, respectively. 
Total number of multipotent progenitors (LSK and LK cells) (e), mature 
CD45* blood cells, including granulocytes (GR-1*+) and macrophages 
(F4-80*) (f) and myeloid progenitors (common myeloid progenitor 
(CMP), granulocyte-monocyte progenitor (GMP) and megakaryocyte— 
erythrocyte progenitor (MEP)) (j). Data are mean + s.e.m. P values 


from unpaired two-tailed t-tests are shown. g, i, Percentages of mature 
CD45+ leucocytes, GR-1+ and F4-80* cells (g) and CMP, GMP and 
MEP cells (i). Data are mean + s.e.m. P values from unpaired two- 
tailed t-tests are shown. I, Differentiation of E13.5 fetal liver (c-KIT*) 
progenitors into CFU-granulocytes and macrophages (GM), BFU-E 
and/or CFU-granulocyte, erythroid, macrophage, megakaryocyte 
(GEMM). Mean + s.e.m. (n =2 fetal livers). m, Micrographs of 
differentiated macrophages (n =3 Ripk3~/~ Casp8~'~ Hoil-1*/~ and 
Ripk3~'~ Casp8~'" Hoil-1~'~, n=5 MIkI-'" Casp8~'~ Hoil-1*'~ and 

n=4 MIkI-'~ Casp8~'~ Hoil-1~'~ fetal livers) and percentage viability 

of macrophages from E13.5 fetal liver cell suspensions from embryos 

of the indicated genotypes in the presence or absence of the indicated 
inhibitors. Data are mean + s.e.m. (1 =3 Ripk3~'~ Casp8'~ Hoil-1*/— 
and Ripk3~!~ Casp8~'~ Hoil-1~'~, n=5 MIkI-'~ Casp8"'" Hoil-1*'~ and 
n=4 MIkI-'~Casp8~'“ Hoil-1~~ fetal livers). P values from two-way 
ANOVA are shown. 0, Microfocus computed tomography scan images 
of Ripk3~'~ Casp8~!~ Hoil-1~'~ embryos showing maximum intensity 
projections, with windowing applied to highlight vasculature (high 
contrast). No anatomical defects that would explain destruction of red 
blood cells or poor distribution of blood to the peripheries were found 
(n=3 embryos). In the left image, yellow star denotes distal aorta, green 
star denotes umbilical vessels, and red star indicates descending thoracic 
aorta. In the right image, yellow star denotes carotid artery, red star 
denotes descending thoracic aorta, white star denotes ductus arteriosus, 
and blue star denotes ascending thoracic aorta. p, Representative FACS 
profile of a pool of three E11.5 dorsal aortas, containing the AGM 
region, per indicated genotype and quantification. This experiment was 
performed once with three embryos per genotype. 
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Extended Data Fig. 9 | Concomitant deletion of RIPK1 prevents embryos (left) and table listing the altered cytokines (right). Red squares 
embryonic lethality of Ripk3—/— Casp8~!~ Hoil-1~'~ mice. highlight the differences (n = 1 for each genotype). For gel source data, 
a, Kaplan-Meier plot of mouse survival (n = 17 for Ripk1~'~ see Supplementary Fig. 1. e, Cytokine analysis in homogenates from 
Ripk3~'~ Casp8~'~ Hoip~'~ and n = 2 for Ripkt'" Ripk3-!~Casp8-'"Hoip-/-_ embryos of the indicated genotypes. Data are mean+s.e.m. (n=3 
mice). b, Quantification of genotypes of animals obtained from embryos per genotype). P values from one-way ANOVA are reported. 
intercrosses of Ripk1*+/~Hoil-1*'~ mice. For simplicity not all possible f, Representative images of E16.5 embryos from control mothers or 
genotypes are represented. c, Percentage viability of macrophages mothers fed with the RIPK1 kinase inhibitor GSK’457A from mating 
from E13.5 fetal liver cell suspensions from embryos of the indicated and throughout gestation (embryos treated with GSK’457A n=5 for 


genotypes. Data are mean+s.e.m. (n=5 fetal livers/genotype). Statistical Ripk3~'~Casp8~/~Hoil-1*'~ and n=7 Ripk3~'~ Casp8*!~ Hoil-1~'~ and 
significance was determined with unpaired two-tailed t-tests. d, Cytokine n=3 for Ripk3~/~ Casp8~'~ Hoil-1~/~). Scale bar, 5mm. 
arrays from Ripk3~!~ Casp8~'~ Hoil-1*'~ and Ripk3~/~ Casp8~'~ Hoil-17'~ 
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Extended Data Fig. 10 | Schematic representation of findings in this 
study. a, Diagram indicating extent of viability and phenotypes of single, 
double, triple and quadruple knockout mice. Red lines indicate cell 
death and loss of yolk sac vascularization phenotype. Green line indicates 
mild cell death phenotype without loss of yolk sac vascularization. 
Asterisk indicates that heart defects were observed. b, Proposed model 
of LUBAC function during embryogenesis. At mid-gestation (left), 
LUBAC maintains vascular tissue integrity by preventing aberrant 
TNE/LT-a-mediated caspase-8- and RIPK3/MLKL-induced cell death. 
At late gestation, LUBAC is required not only to prevent aberrant cell 
death but also to prevent severe defects in haematopoiesis that are driven 
by RIPK1 but can be prevented by RIPK3 (middle). Genetic ablation of 
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LUBAC and of different components of the cell death machinery indicates 
that (right): (1) in the absence of LUBAC, caspase-8 and RIPK3, RIPK1 
provokes lethality, probably by depleting multipotent progenitors in the 
haematopoietic compartment; (2) in the absence of caspase-8 and MLKL, 
cell death induced by loss of LUBAC is prevented and RIPK3 is present 
to exert its protective role on fetal haematopoiesis by precluding aberrant 
RIPK1 signalling; and (3) in the absence of caspase-8 and RIPK3, the 
presence of LUBAC is sufficient to prevent RIPK1 from causing severe 
defects in haematopoiesis and lethality since Ripk3~'" Casp8~/~ mice are 
viable'*!>4, This indicates that RIPK3 and LUBAC can compensate for 
each other to block aberrant RIPK1 signalling. 
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Reverse transcription of the HIV-1 RNA genome into double- 
stranded DNA is a central step in viral infection! and a common 
target of antiretroviral drugs”. The reaction is catalysed by viral 
reverse transcriptase (RT)** that is packaged in an infectious virion 
with two copies of viral genomic RNA’ each bound to host lysine 3 
transfer RNA (tRNA), which acts as a primer for initiation of 
reverse transcription®’. Upon viral entry into cells, initiation is slow 
and non-processive compared to elongation®*. Despite extensive 
efforts, the structural basis of RT function during initiation has 
remained a mystery. Here we use cryo-electron microscopy to 
determine a three-dimensional structure of an HIV-1 RT initiation 
complex. In our structure, RT is in an inactive polymerase 
conformation with open fingers and thumb and with the nucleic 
acid primer-template complex shifted away from the active site. The 
primer binding site (PBS) helix formed between tRNA’; and HIV-1 
RNA lies in the cleft of RT and is extended by additional pairing 
interactions. The 5’ end of the tRNA refolds and stacks on the PBS 
to create a long helical structure, while the remaining viral RNA 
forms two helical stems positioned above the RT active site, with a 
linker that connects these helices to the RNase H region of the PBS. 
Our results illustrate how RNA structure in the initiation complex 
alters RT conformation to decrease activity, highlighting a potential 
target for drug action. 

During the initiation phase of reverse transcription, RT must bind 
productively to the viral RNA-tRNA™; complex and then navigate a 
highly-structured 5’ region of the HIV-1 genome”. Critical elements 
within the viral RNA and host tRNA that are necessary for efficient 
initiation have been identified®!'-!8. RT pauses at discrete locations, is 
generally slowed during initiation compared to elongation®””°, and can 
bind the viral RNA-tRNA’; primer site in different orientations’. A 
rich body of structural data on RT, a heterodimer of p51 and p66 sub- 
units, have shown how its polymerase and RNase H domains interact 
with DNA-DNA and DNA-RNA duplexes in the absence and presence 
of antiviral drugs”. Lacking, however, are structures that reflect initia- 
tion, showing how RT binds to a large bimolecular viral RNA-tRNAD'; 
complex. 

We used cryo-electron microscopy (cryo-EM) complemented by 
biochemical and biophysical experiments to determine the molecular 
architecture of an HIV-1 reverse transcriptase initiation complex 
(RTIC). The RTIC was formed using a 101-nucleotide fragment of 
HIV-1 genomic RNA (vRNA) that encompasses the primer binding 
site and additional RNA elements required for efficient initiation of 
reverse transcription (Fig. la). A binary VRNA-tRNA complex was 
formed with human tRNA’; that contained a specific cross-linkable 
nucleotide (‘convertible G’) at position 71. The RTIC is kinetically 
labile, undergoing rapid RT dissociation from the tRNA-vRNA 
complex®? with several distinct binding orientations’. To stabilize the 
RTIC for structural characterization, the VRNA-tRNA complex was 
specifically cross-linked to RT containing a Q258C mutation in the p66 
subunit (Fig. 1b), which interacts in the minor groove of RT-nucleic 


acid complexes”! After extending the tRNA primer by one dideoxy- 
nucleotide to achieve the highest crosslinking efficiency, we generated 
the cross-linked VRNA-tRNA-RT ternary complex and purified it from 
free RT and RNA (Fig. 1c, Extended Data Fig. la—c). Crosslinking did 
not affect the global activity of the RTIC. The final cross-linked HIV-1 
RTIC had equivalent total activity in incorporation of the next dNTP as 
an un-crosslinked initiation complex, with rates that are only threefold 
slower, and is strongly inhibited by nevirapine, a non-nucleoside RT 
inhibitor (NNRTI) that works through conformational modulation 
of RT”? (Extended Data Fig. 1d-h). The RTIC studied structurally 
here thus represents an active functional state of reverse transcription 
initiation. 

We first assessed the quality of the RTIC sample by negative stain 
electron microscopy”’, which confirmed a homogenous RT-RNA 
complex (Extended Data Fig. 2a). Upon cryo-EM preparation, how- 
ever, the complex dissociated. This problem was alleviated by addition 
of beta-octyl glucoside, which resulted in monodisperse single par- 
ticles that we visualized by cryo-EM (Extended Data Fig. 2b, c). 
Two-dimensional class averages of RTIC clearly showed the RT core 
as well as protruding RNA densities (Extended Data Fig. 2d). Three- 
dimensional classification of particle projections revealed substantial 
conformational variability in the apex of RNA densities (Extended 
Data Fig. 3a). Owing to this segmental flexibility, we obtained a low- 
resolution (8.0 A) reconstruction that best describes the global 
architecture of RTIC (Fig. 2a, Extended Data Fig. 3b). This EM-density 
map, encompassing protein and all RNA regions, was of sufficient quality 
to visualize the tRNA and vRNA, thereby enabling us to position 
approximately the RNA structures located outside the RT binding 
cleft. In addition, we obtained a 4.5 A map by masking out the dynamic 
peripheral RNA elements and focusing the particle classification and 
structure refinement on the RT, primer binding site (PBS) helix in the 
cleft, and additional helical tRNA density (Fig. 3, Extended Data Fig. 3). 
This higher-resolution map allowed us to describe the conformation 
of RT and the RNA inside the binding cleft (Fig. 3, Extended Data 
Fig. 4). An independent 8.2 A cryo-EM reconstruction of the RTIC 
was determined in low salt and Mg’* and revealed a very similar global 
conformation for the complex (Extended Data Fig. 5), suggesting that 
the RTIC architecture has limited salt dependence. Models were con- 
structed using the 8.0 A map to define the global RTIC architecture 
and the 4.5 A map to define the structural features of the RTIC core 
and active site. While the 4.5 A map provided sufficient resolution to 
orient the PBS helix of the RTIC, the orientation of the peripheral RNA 
helical elements of the VRNA and tRNA into the 8.0 A map was more 
subjective and relied on iterative Rosetta” modelling using an accepted 
secondary structure from past biochemical and biophysical data'*”* 
(see Methods). 

The overall RTIC structure shows the RT core with RNA double- 
helical density within the binding cleft that spans from the active site to 
the RNase H domain. The helical RNA in the cleft corresponds to the 
HIV-1 PBS helix formed between nucleotides (nts) 59-76 in the tRNA 
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Fig. 1 | RTIC constructs and purification. a, HIV-1 viral RNA (NL4.3) 


and tRNA; sequences used in RTIC formation. The viral RNA includes 
sequences complementary to the tRNA primer (coloured). Interactions 
between the regions are reported to be involved in the regulation of 
initiation. b, The crosslinking scheme used for purification of the RTIC. 
The N2-cystamine-dG was placed at position 71 of the tRNA primer. After 
extending the primer by one nucleotide, a ddCTP (red), a disulfide bond 
forms between G71 and mutated C258 on the RT p66 thumb subdomain. 
c, Non-reducing SDS-PAGE gel of the free VRNA-tRNA, RT, and 
crosslinked RTIC. RT runs as two bands corresponding to the two 
subunits. The annealed VRNA-tRNA complex runs as a single band on 

the gel. The purified RTIC runs as two bands corresponding to the p51 
subunit and the crosslinked p66-vRNA-tRNA. Gel analysis was performed 
on all samples used in the manuscript (>10) and consistently exhibited 
similar results. 


and 182-199 in the VRNA, with the addition of one ddCTP nucleotide 
needed to elongate the complex and allow efficient RT-RNA cross link- 
ing (Fig. 2c). The helical density for this +1 extended PBS helix is further 
extended near the RNase H domain by formation of an additional four 
base pairs, probably between complementary tRNA nts 55-58 and viral 
RNA nts 200-203. The nucleotide identities of positions 201-203 are 
highly conserved among recorded HIV-1 sequences”® (70% for 201 
and >96% for 202/203), suggesting that this is a common structural 
feature. In the 8.0 A global map, a long continuous helical RNA den- 
sity is observed to extend away from the RNase H domain (Fig. 2b). 
Accordingly, we propose that the 5’ end of the tRNA (nts 1-54) refolds 
to form a secondary structure with a contiguous helix (Fig. 2c). 
Specifically, the D and anticodon stems from nts 10-44 rearrange 
to form a continuous helical structure, which fits the observed density 
far better than the three-way junction’””* observed in the free initiation 
complex (Extended Data Fig. 6). 

The helical refolded tRNA domain is connected by a single-stranded 
connection loop to a 7-bp helix (H1) involving the 5’ (nts 125-131) and 
3’ (nts 217-223) termini of the viral RNA construct (Fig. 2b, c) and 
containing the conserved primer activation signal (PAS) sequence. H1 
and the connection loop form a bridge between the RNA located in the 
RNase H domain and that located near the active site of RT. A three-way 
RNA junction is formed by the PBS, H1 anda second helical stem loop 
(H2) comprising nts 134-178 of HIV-1 viral RNA. Density consistent 
with single-stranded RNA connects H2 to the PBS in the active site. 
The relative strength (indicative of stability) of the EM density for H1, 
the connection loop, and the apical regions of H2 differs among several 
of our low-resolution classes, as do their orientations with respect to 
the base of H2 and the PBS (Extended Data Fig. 7a, b). For classes that 
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Fig. 2 | Global architecture of the RTIC. a, Unmasked 3D reconstruction 
of the entire RTIC at a global resolution of 8.0 A. A model of RT (p66, 
purple; p51, grey) is docked into the map. A low-threshold map has been 
overlaid into the density to illustrate the helical nature of the bound RNA 
duplex. b, Global model of the RTIC with the VRNA-tRNA components. 
The electron micrograph density accounts for the majority of VRNA and 
tRNA structure. Density corresponding to the upper HIV helix 2 stem 
loop is missing, suggesting that it is partially disordered. c, Proposed 
secondary structure of the VRNA-tRNA bound within the RTIC. The 
majority of VRNA helices are well accounted for in the density, with the 
exception of the apical portions of helix 2 (faded). Additional base pairs 
(boxed) between the vRNA and the tRNA, which extend the PBS helix, are 
consistent with the continuous helix that spans the RT binding cleft. The 
tRNA has refolded and adopted an extended helical conformation. 


contain strong density of these RNA features, similar models fit these 
maps by treating the helical RNA elements as rigid units around flexible 
junction regions (Extended Data Fig. 7c). The presence of helix H1 
was confirmed by single-molecule Forster resonance energy transfer 
(FRET) experiments, in which Cy3 dye was attached to the 5’ phosphate 
of the VRNA and a Cy5-labelled oligonucleotide was hybridized to an 
extension on the 3’ end (Extended Data Fig. 8a). In this experiment, 
observation of a high FRET state would indicate H1 formation. In the 
buffer conditions used for cryo-EM imaging, we find that more than 
95% of RTIC molecules are in a stable, high-FRET state, indicating 
that H1 forms for a surface-immobilized RTIC at room temperature 
(Extended Data Fig. 8b, c). 

Although the RTIC is active in the addition of the next dNTP 
(Extended Data Fig. 1d, e, h), the complex adopts an inactive confor- 
mation in which the position of the tRNA primer terminus within the 
palm subdomain is shifted approximately 13 A away from the active 
site of RT, reminiscent of nucleic acid-RT complexes bound with an 
NNRTI” (Fig. 4a, Extended Data Fig. 9b). As observed in RT structures 
with bound NNRTI, the primer grip (B12-B13-B14 sheet) is displaced 
towards the 3’ terminus of the primer strand’. The PBS helix is not 
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Fig. 3 | Structure of the RTIC core. a, Cryo-EM density map of the RTIC 
core at 4.5 A resolution. The helical PBS density is regular until it reaches 
the RNase H active site. Extra helical density for an additional four base 
pairs between the vRNA and tRNA is present. Density corresponding 

to the tRNA is located just outside the RNase H domain. The junction 
between the extended PBS and tRNA helix is distorted, possibly owing 

to masking or flexibility in this region. This portion of the RTIC is 


translocated, with the dC77-G181 pair in the nucleotide acceptor site 
(N-site) (Fig. 4). The PBS helix is also lifted about 6.0 A away from the 
palm and connection domains. The path of the viral RNA template and 
the base of H2 come into close contact with the residues in the fingers 
domain. The fingers domain of RT adopts a semi-open conformation 
similar to that of RT structures bound to nucleic acids that lack an 
incoming nucleotide” (Fig. 4a, Extended Data Fig. 9). On the basis of 
previous mechanistic studies of RT enzymology*”’, we conclude that 
the RTIC here is blocked in a pre-translocation conformation for the 
primer-template complex (Fig. 4c). Unlike NNRTI-bound complexes, 
the RTIC is functional and can incorporate the next dNTP, suggest- 
ing that there is conformational plasticity within the RT active site. 
Although RT contacts the RNA substrate using similar domains as in 
previously determined RT-nucleic acid complexes”, the extent of 
these interactions appears different. The thumb and RNase H domains 
make the vast majority of observed RNA contact in the RTIC, with a 
substantial loss of potential interactions in the palm and connection 
subdomains, consistent with decreased RT-RNA affinity in the initia- 
tion complex. The loss of RT-RNA contacts in the palm subdomain” 
arises from displacement of the tRNA primer terminus away from the 
active site (Fig. 4a). Although the RTIC structure is not at sufficient 
resolution to identify specific protein-RNA contacts, there appear to 
be additional RT-RNA interactions involving the fingers domain with 
the vRNA template-strand and H2. The sterically bulky VRNA helices 
immediately adjacent to the fingers region form a wedge that hinders 
proper accommodation of the PBS into the cleft and leads to loss of RT- 
RNA contacts in the cleft and displacement of tRNA 3’ end (Fig. 4b); 
this is likely to inhibit translocation of the PBS helix to enable efficient 
and rapid incorporation of the next dNTP. 

The architecture of the VRNA-tRNA complex in the initiation 
complex explains previous experimental results on the role of RNA 
in initiation*""°. The observed RNA conformation is consistent with 
chemical probing and enzymatic mapping on similar binary VRNA- 
tRNA and ternary complexes, which were previously interpreted in 
terms of tRNA-viral RNA pairings!?!*”°. No additional interactions 
between the VRNA and tRNA occur beyond the extended PBS helix at 
the +1 stage of initiation (Fig. 2b), consistent with biochemical results 
on similar HIV-1 subtype-B sequences'*!+, Notably absent is any PAS- 
anti-PAS interaction between HIV-1 nts 123-130 and tRNA nts 48-55 
(Fig. 1a), which has been implicated in RT initiation and shown to 
form dynamically in the absence of RT!!!*?73°, The formation and 
positions of VRNA H1 and H2 are consistent with their proposed func- 
tion as barriers during initiation!*'*!°. The conserved connection loop, 
bridging RNA within the RNase H domain back to H1, may help to 
position the vRNA helices in the proper orientation for binding of RT 
to the tRNA 3’ terminus. The HIV-1 genomic RNA from the MAL 
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helical in the global map (Fig. 2). b, Representative model of the RTIC 
core that accounts for RT and the entire +1 extended PBS helix. There 
is helical density that can accommodate additional base pairs between 
the VRNA and tRNA, but we have not included it in this model. c, Fit of 
the polymerase active site region in the 4.5 A map. The RNA and protein 
backbone are well modelled in this region of the map. 


isolate, commonly used in past initiation studies, maintains many of 
these sequence elements with an added 23-nt insertion in the connec- 
tion loop that may engage in additional interactions. Both H1 and H2 
are required for efficient initiation of reverse transcription, and their 
displacement and unfolding are required for reverse transcription to 
proceed™!}1517; melting of H2 during initiation occurs after addition 
of the sixth nucleotide”. 

tRNA’; in the RTIC forms an elongated helical structure compat- 
ible with an alternative predicted fold*! that involves an extended PBS 
structure stabilized by RT. This conformation is consistent with the 
presence of modified nucleotides in tRNA; (Extended Data Fig. 6d), 
and is favoured by the extended stacking and RT contacts around the 
RNase H domain. The RNA fold in the RTIC is likely to sequester 
important sequences for VRNA-tRNA interactions, such as the PAS- 
anti-PAS and A-rich loop—anticodon sequence interactions”!!"!8, 
which may subsequently form as RNAs rearrange in response to RT 
extension during initiation. The RNA tertiary conformation within 
the RTIC is clearly dynamic, as shown by published single-molecule 
data’®730 and suggested by our cryo-EM data. We observe several 
conformations with variable orientations and density for the extended 
tRNA helix and three-way junction of H1, H2 and PBS (Extended Data 
Fig. 7). Such plasticity is likely to be essential for the RTIC to proceed 
to elongation. 

Our results suggest a model of RT initiation in which RNA structure 
regulates RT activity. tRNA”; and VRNA form a dynamic RNA com- 
plex, in which the tRNA refolds to form a metastable conformation 
of its 5’ region. The ability to refold in this way could explain the use 
of tRNA’; in HIV-1 initiation. Although RT contacts the PBS in the 
cleft, the disrupted palm subdomain contacts between the PBS and RT 
explain the poor affinity of RT for the VRNA-tRNA complex. Within 
the framework of the standard dNTP incorporation mechanism”, RT 
in this +1 initiation complex adopts a pre-translocation conforma- 
tion with an open active site and improper positioning of nucleic acid 
for catalysis (Fig. 4c). The vVRNA helices, whose orientation hinders 
productive binding to RT, must be displaced and/or unfolded for the 
tRNA primer terminus to reposition within the active site such that 
the RT fingers can clamp down on an incoming nucleotide (Fig. 4b). 
The dissociation of RT during initiation is rapid, and competes with 
forward polymerization reactions*’. RT may dissociate and rebind to 
the VRNA-tRNA to reposition the primer terminus into the active site. 
In this pathway, RT rebinding could facilitate melting of downstream 
RNA structures that hinder translocation. The necessity for these rear- 
rangements during early stages of initiation is likely to explain the low 
processivity of initiation and the observed pauses that control the start 
of HIV-1 replication®’. The single-stranded, A-rich connection loop 
bridging the 3’ end of the VRNA PBS to H2 may position the VRNA 
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Fig. 4 | The +1 RTIC adopts an inactive conformation. a, Comparison 
of RTIC primer (red) and template (yellow) strands with RT-dsDNA 
complex (pink, 1RTD?*) that has the 3’ primer terminus located in the 
P-site with the N-site occupied by a nucleotide. The PBS helix of the 
RTIC must be translocated and shifted in order to reposition into a P-site 
conformation. The thumb is in an open conformation and the primer 
grip has shifted compared to the active structure. The fingers are in a 
semi-open conformation”. b, VRNA structure outside the RT active site 
may prevent proper translocation of RNA substrate during initiation. 
Two views of the RNA (vRNA, yellow; tRNA, red) near the active site of 
RT. The arrows indicate the direction in which the RNA must move in 
order for the PBS to reposition into the active site. The global structure 


helices properly and allow conformational communication with the 
RT RNase H domain and refolded tRNA. As reverse transcription 
proceeds, structural rearrangements in VRNA and tRNA must occur 
to favour the transition to processive elongation. Thus, the initia- 
tion complex is likely to change progressively as initiation proceeds, 
and may be specifically vulnerable to inhibition by drugs. Higher- 
resolution structural views of these different states, and dynamics to 
link them together, will be needed to elucidate further the steps of 
initiation and underlying RNA conformations that regulate early steps 
in HIV-1 infection. 
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METHODS 


Sample preparation. HIV-1 vRNA constructs were prepared by in vitro tran- 
scription with T7 RNA polymerase as previously described”””*. Transcripts were 
denatured in 8 M urea and purified on a sequencing PAGE gel. Gel extraction was 
performed using 0.3 M ammonium acetate. Following ethanol precipitation, the 
RNA was dissolved in 10 mM Bis-Tris propane, pH 7.0, 10 mM NaCl and stored 
at —20°C. The crosslinkable tRNA’; construct was purchased from TriLink 
Biotechnologies. The crosslinkable RNA primer was chemically synthesized, 
PAGE purified, and analysed by denaturing PAGE and mass spectrometry. During 
synthesis, an N2-cystamine-2'-deoxyguanosine was placed at the 71 position for 
crosslinking purposes. 

vRNA-tRNA complexes were formed by mixing the VRNA and tRNA ina 1:1 
molar ratio at 1|1.M each in 10mM Bis-Tris propane, pH 7.0, 10 mM NaCl. The 
mixture was heated to 90°C and slow cooled to room temperature. The VRNA- 
tRNA complex was purified away from higher order and unannealed monomer 
species using a Superdex 200 (26/60) gel filtration column with 10 mM Bis-Tris 
propane, pH 7.0, 100 mM NaCl. The presence of a single species was confirmed 
with native PAGE and samples were concentrated on a Vivaspin 20 10,000 MWCO 
concentrator. Samples were stored at —20°C and exhibited minimal aggregation 
over time. 

HIV-1 RT was expressed in Escherichia coli strain BL21(DE3). Two expression 
vectors, one containing p66 and ampicillin resistance and the other containing p51 
and kanamycin resistance, were constructed. The C terminus of p66 contains an 
unstructured linker and a six-histidine tag. A cysteine mutation for crosslinking 
was introduced into helix H of p66 (Q258C)””. The protein used in this study 
also had the C280S mutation, introduced in prior structural work, and the E478Q 
mutation, introduced to eliminate RNase H activity as RT has been shown to cleave 
dsRNA when stalled for long periods”®"”. Cell pellets were lysed through sonication 
and the enzyme was purified by gravity Ni-nitrilotriacetic acid (Ni-NTA) affinity 
chromatography, followed by size-exclusion chromatography using a Superdex 200 
(26/600). The His, tag was cleaved by thrombin digestion overnight. The cleaved 
protein was re-applied to a Ni-NTA column to remove protein with an uncleaved 
Hisg tag. This was followed by an additional final size-exclusion chromatography 
step. The protein was stored at 4°C in 300 mM NaCl, 50mM Tris, pH 8.0, 5mM 
B-met. 

The RTIC was prepared by mixing RT and VRNA-tRNA complex at 2 and 11M, 
respectively, in a buffer containing 25 mM NaCl, 25mM KCl, 5mM MgCh, 50 mM 
Tris, pH_ 7.5, 100\1M ddCTP (or dCTP if used for +2 incorporation assays). The 
mixture was allowed to crosslink overnight at room temperature. The complex 
was purified by anion-exchange chromatography with a linear gradient. This was 
followed by a size-exclusion chromatography step to remove any higher-molecular- 
weight aggregates. The purity and homogeneity of the final complex were assessed 
by SDS-PAGE (under non-reducing conditions) and size-exclusion chromato- 
graphy (Extended Data Fig. 1). 

Amino-GMP-labelled viral RNA for single-molecule experimentation was tran- 
scribed as previously described but with nucleotide concentrations of 1mM ATP, 
CTP and UTP and 0.5mM GMP. The vRNA sequence is identical to that used in 
the cryo-EM experiments, but contains an additional unstructured sequence on 
the 3’ end for immobilization and oligonucleotide hybridization purposes and an 
additional GGU on the 5’ end for labelling purposes. 5’-Amino-G-monophosphate 
(GMP), purchased from TriLink Biotechnologies, was added to the reaction at a 
final concentration of 1 mM. The reaction was incubated at 37°C for 4h. 5’-Amino- 
GMP-labelled RNA was purified by phenol/chloroform extraction followed by a 
10DG (Bio-Rad) desalting column in 10mM Bis-Tris (pH 7.0), 75 mM NaCl. The 
RNA was then separated from template DNA and free NTPs by size-exclusion 
chromatography (ENRICH SEC 650 10 x 300) in 100mM sodium phosphate 
buffer (pH 8.2), 75mM NaCl. Purified amino-GMP-labelled RNA was concen- 
trated to 141M and labelled using NHS chemistry with 1,000-fold excess cyanine 
dye (Lumiprobe). Excess dye was removed by passage over a 10DG desalting col- 
umn follow by size-exclusion chromatography (ENRICH SEC 650 10 x 300) puri- 
fication to buffer exchange the labelled vVRNA. Labelling efficiency was calculated 
by measuring the absorbance values of the labelled species at both 260 nm (RNA 
absorbance) and 550 nm (Cy3 absorbance). These absorbance values were used to 
calculate the concentrations of the RNA and the Cy3 dye. Using the ratio between 
these two values, we estimate that our 5’ labelling efficiency is approximately 70%. 

Dye-labelled VRNA-tRNA complexes were heat-annealed and purified as 
previously described. The single-molecule RTIC complex was prepared as stated 
above, but with a several modifications. To simplify the purification, the his-tag 
was kept on the p66 subunit of RT. The RTIC was then applied to a Ni-NTA col- 
umn and washed with 300 mM NaCl to remove the free VRNA-tRNA complexes. 
The RTIC was eluted from the column. Synthetic oligonucleotides with sequences 
5’-GCGGGAGAUCAGGCAU(Am6)-cyanine5-3’ and 5’/-biotinCUAUUCCCU- 
AUCCdC-3’ (Trilink) were annealed to the complex at 37°C for 5 min in tenfold 
molar excess. Excess oligonucleotides and free RT were rinsed away during TIRF 
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slide preparation. The above protocol was also performed for a dye-labelled VRNA- 
tRNA-only control, but skipped the RTIC complex formation, purification, and 
free RT rinse. 

Single-molecule FRET experiments. Single-molecule FRET experiments were 
performed using a prism-based total internal reflection instrument with a diode- 
pumped solid-state 532-nm laser as previously described?”*3-**. This includes 
the use of an oxygen scavenging system (protocatechuate 3,4-dioxygenase 
(PCD) and B-carboxy-cis, cis-muconic acid (PCA)) and a triplet state quencher 
(6-hydroxy-2,5,7,8-tetramethylchroman-2-carboxylic acid (Trolox)) to reduce 
aberrant dye behaviours. The laser power measured 50 mW at the prism. The 
fluorescence signal was recorded with an exposure of 100 ms per frame for 5 min 
at room temperature. FRET traces were manually analysed using home-written 
scripts in MATLAB (MathWorks)”’. This analysis began by using a colocalization 
script to select only spots that exhibited both Cy3 donor and Cy5 acceptor fluores- 
cence (under donor-only excitation conditions). Such colocalization allowed us to 
eliminate partially labelled molecules. Next, FRET traces were manually inspected 
to eliminate cases with multiple single-dye photobleaching events (multiple mol- 
ecules) or traces that exhibited poor dye photophysics (Extended Data Fig. 8d). 
After this manual inspection, the final dataset used for analysis included only 
traces in which both dyes exhibited clear single photobleaching events to ensure 
reliable data (Extended Data Fig. 8c). For our single-molecule experiment, 708 
traces were selected through colocalization. After manual inspection and elimi- 
nation of poor traces, 480 traces were used for the final analysis. In addition to the 
RTIC experiment, a control experiment using a dye-labelled VRNA-tRNA was 
performed to assess the FRET states without RT. We found that in the absence of 
cross-linked RT, a small population of low-FRET-state molecules with FRET effi- 
ciency 0.3 existed, but this state was not observed upon binding and cross-linking 
of RT (data not shown). 

Negative-stain EM. We applied 3.5 11 of 0.1,4.M RTIC sample onto glow- 
discharged carbon-coated grids, and blotted and stained them with 1% uranyl 
formate according to standard protocols”*. Negative-stained grids were imaged 
on an FEI Morgagni at 100kV. 

Cryo-EM data acquisition. RTIC complex in high monovalent salt buffer (300 mM 
NaCl, 10mM Tris-HCl pH 8.0) containing 0.2-0.25% (w/v) beta-octyl glucoside 
(8-OG) was applied to glow-discharged holey carbon grids (Quantifoil R2/2, 200 
mesh) and subsequently vitrified using a FEI Vitrobot. Frozen hydrated samples 
were imaged on an FEI Titan Krios at 300kV with a Gatan K2 Summit direct 
detection camera in counting mode with 200 ms exposure per frame. Forty frames 
per micrograph were collected at a magnification of 29,000 x, corresponding to 
1A per pixel at the specimen level. In total, 4,209 micrographs were collected at 
defocus values ranging from —1.3 to —2.5|1m. The movie frames were motion- 
corrected and dose-weighted by MotionCor2*? and CTF parameters were 
estimated by CTFFIND4”. 

RTIC complex in low monovalent salt buffer and Mg?* (75mM NaCl, 2mM 
MgCh, 10mM Tris-HCl pH 8.0) containing 0.2% (w/v) B-OG was applied to 
glow-discharged lacey carbon grids (EMS, 200 mesh, Copper) and subsequently 
vitrified using a Leica EM GP. Frozen hydrated samples were imaged on a Tecnai 
F20 at 200kV with a Gatan K2 Summit direct detection camera in counting mode 
with 200 ms exposure per frame. Sixty frames per micrograph were collected at a 
magnification of 29,000, which corresponds to 1.286 A per pixel at the specimen 
level. In total, 898 micrographs were collected at defocus values ranging from —2.0 
to —3.0jm and a dose rate of 8.0 electrons per pixel per second. The micrograph 
movies were motion-corrected and dose-weighted as above, and CTF parameters 
were estimated by GCTF“!. 

Cryo-EM data processing. Cryo-EM data for the 8.0 and 4.5 A maps were pro- 
cessed using Relion’? “. 765,688 particle projections were semi-automatically 
picked from the motion-corrected micrographs, and sorted through subsequent 
rounds of reference-free 2D classification. 444,374 particle projections belonging to 
classes with well-defined RT and RNA features were selected for further processing 
(Extended Data Fig. 3a). An initial 3D model was obtained using VIPER® based 
on the selected 2D classes, and used for 3D classification in Relion***4 (Extended 
Data Fig. 3a). Because particle alignment was affected by the flexible protruding 
RNA, we used a mask and focused the alignment on RT and PBS alone. 167,906 
particle projections sorted to 3D classes displaying all features of RT and PBS 
were selected for subsequent 3D classifications. To further improve the quality of 
RT/PBS core, one more round of 3D classification with finer angular sampling was 
executed; particles from two classes with well-defined secondary structure densities 
were combined and the 3D structure was refined to a resolution of 4.5 A. For global 
RTIC maps (including the flexible protruding RNA) a 3D classification without 
mask using the 167,906 particle projections subset was performed; eight classes 
obtained showed the tRNA and vRNA in various conformations. The class display- 
ing most of the RNA protrusions was refined to a resolution of 8.0 A. The resolution 
reported is according to the 0.143 ‘gold standard’ Fourier shell correlation (FSC) 
criterion (Extended Data Fig. 3b). The 4.5 and 8.0 A maps were corrected for the 
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modulation transfer function (MTF) of K2 direct detection camera at 300kV and 
then sharpened using B factors of —250 and —200 A, respectively, during the 
post-processing step (Extended Data Table 1). Local resolution was estimated using 
Relion (Extended Data Fig. 3c). 

Cryo-EM data for the 8.2 A Mg** map were processed using Relion. 148,523 

particle projections were semi-automatically picked from motion-corrected micro- 
graphs and sorted through subsequent rounds of reference-free 2D classification. 
125,615 particle projections belonging to classes with well-defined RT and RNA 
features were selected for further processing (Extended Data Fig. 3e). An initial 
3D model was obtained using EMAN2 based on selected 2D classes“® and used for 
3D classification. The resolution reported is according to the 0.143 ‘gold standard’ 
FSC criterion (Extended Data Fig. 3f). The maps were corrected for the modu- 
lation transfer function (MTF) of K2 direct detection camera at 200kV and then 
sharpened using a B factor of —200 during the post-processing step (Extended 
Data Table 1). 
Model building and refinement. The crystal structure of RT bound to a DNA- 
DNA duplex, with nucleic acid substrate removed, was used as a starting model 
for RT’. After manually fitting the main-chain backbone of RT into its distinct 
density, four regions of EM density corresponding to RNA were apparent. The most 
notable region of density is the well-formed RNA helix in the cleft of RT, which 
corresponds to the PBS helix of the VRNA-tRNA complex. The complete model 
of the VRNA-tRNA complex was built piecewise, and iteratively, using the Rosetta 
FARFAR method”. First, initial models of the first seven base pairs of the PBS 
helix (vVRNA residues 181-187 and tRNA residues 71-77 with the extended dC, 
originally modelled as RNA for simplicity and later edited to a ddC in Coot”) were 
built with FARFAR, then clustered. The centres of the ten most populated clusters 
were fit into the density using the colores tool in the Situs package**. The resulting 
models were manually inspected and selected on the basis of fit to the density and 
proximity of G71 to C258 on RT (base and residue involved in crosslinking). This 
helix was extended out to nineteen base pairs (to include VRNA residues 187-199 
and tRNA residues 59-70), models were again clustered, and the cluster centres 
were fit into the density. The model that best fit the 4.5 A cryo-EM density was 
selected. This RT-PBS model, called the RTIC core, was then refined using Phenix 
real-space-refinement” with secondary structure restraints in place for the RNA 
and protein. To further restrain the model during refinement, the N2-cystamine- 
deoxyguanosine was inserted into the tRNA sequence and a loose disulfide bond 
constraint with C258 was used during refinement (this was later reverted to a dG 
as there was no density for the linker atoms). The model was visually inspected and 
manually adjusted in Coot”. Protein residues lacking EM density, the vast majority 
of which were located in the fingers and palm subdomains, were removed after 
comparison to prior models of RT (Extended Data Fig. 4). Owing to insufficient 
resolution, large regions of RT did not exhibit reasonable density for sidechains. 
Therefore the RT model was truncated to a main-chain backbone before final 
inspection and submission to the PDB. The geometry of the final refined model 
was validated using Molprobity””. This refined RTIC core model (Fig. 3) served as 
the anchor point for orienting the VRNA and tRNA portions of the global RTIC 
model. 

Two additional regions of RNA density were located near the fingers subdomain 
of RT (Fig. 2a). As the crosslinking method used to form the complex harnesses 
RT polymerase activity, these regions of density correspond to the template VRNA 
helices. We traced the vRNA template strand out of the active site, allowing us to 
confidently orient the base of VRNA H2. Models of VRNA H2 (residues 134-178) 
were built with FARFAR”* based on the consensus secondary structure from past 
biochemical and biophysical data>!. These models were clustered and fit into 
the density. 

After positioning VRNA H2, only one region of RNA density near the fingers 
remained unaccounted. This density, which was continuous with the H2 density, 
corresponds to vRNA H1 (residues 125-131 and 217-223). This density also con- 
nects to RNA located in the cleft of RT near the RNase H domain. This suggests that 
the connection loop may contact H1 and contribute to the density observed in this 
region. To confirm the presence of H1 in the RTIC, single-molecule experiments 
were performed in which a FRET pair was placed on the 5’ and 3’ ends of the helix. 
We find that in our imaging conditions, 85-95% of molecules exist in a stable 
high FRET, consistent with H1 formation. vRNA helix 1 was modelled as an ideal 
A-form helix, then fit into the density using UCSF Chimera™. This initial fit was 
later refined during the global model building (described below). The connecting 
loop was partially built in Coot’ starting from vRNA residues 216 and 204. After 
manual fitting of the first several bases into the density, the rest of the connecting 
loop was built using Rosetta”! and minimized. A model with close fit to the density 
was chosen for later refinement. 

Using the same approach as for the global orientation of the vVRNA helices, 
we find that the fourth and final region of RNA density, located near the RNase 
H domain of RT, corresponds to remaining portion of tRNA’;. While both 
vRNA helices exhibited density consistent with past secondary structure models, 


the tRNA density appeared to differ. Instead of revealing expected density for 
the two independently folded anti-codon and D-stem loops of the tRNA, the 
RTIC global map showed density consistent with a continuous helix. Also notable 
was that this helix extends directly from the PBS helix. After re-examining the 
sequences of the VRNA and tRNA, we noted that it was possible for the VRNA 
and tRNA’; to form four extra base pairs, which would extend the PBS from 
18 to 22 bp. This extended PBS would be consistent with the continuous helical 
density in this region. Three out of the four pairs are conserved among subtype-B 
HIV-1 sequences, suggesting that this structural feature is common. The most 
variable position would pair with the m1A at position 58 of tRNA;. Variability 
in this position is not unexpected, as a Watson-Crick pair would not be able to 
form. We generated a second, alternative fold for the remaining portions of the 
tRNA’; using mFOLD. This secondary structure, which differed from the pre- 
viously observed free form secondary by a free energy of less than 1 kcal/mol, is 
consistent with a long helical structure and accounts for the density observed in 
the RTIC global map. In addition to being a good fit for the density, the bulges in 
this model are consistent with the locations of modified nucleotides that would 
exist in human tRNA’. This model also sequesters the anticodon bases of the 
tRNA, in agreement with chemical mapping data that suggest that these bases are 
paired'?-16°354, We note that the apical portion of the extended tRNA helix has very 
weak EM density and is likely to be dynamic. This dynamic nature of the tRNA is 
illustrated by the wide variety of final conformations seen in the 3D classes of the 
RTIC. Models of the extended tRNA helix (residues 2-53) were built individually, 
clustered, and fit into the density. 

The models of the nineteen-base pair extended-PBS helix, vRNA helix 1, VRNA 
helix 2, and the extended tRNA helix were grafted together, with connecting 
regions built de novo with FARFAR™. Coordinate restraints were applied based 
on the initial fits to the density for each of these four regions. These penalties were 
applied for deviations in positions of more than 10 A. The best-scoring models 
were fit into the density in Chimera and a single model was manually selected 
for further refinement. Regions with the worst agreement with the density, as 
observed by manual inspection, were subjected to further iterations of FARFAR™ 
rebuilding and density fitting. The final VRNA-tRNA model was merged with a 
poly-alanine backbone RT model and refined with one round of Phenix real-space 
refinement using secondary structure restraints. Owing to the inclusion of all 
vRNA and tRNA bases found in our RNA constructs into the model, the model 
building and refinement procedure may force potentially disordered regions to fit 
into the 8.0 A cryo-EM density map. We stress that the global model presented in 
the manuscript is meant to aid in interpreting the orientation of the VRNA and 
tRNA helices with respect to RT and its active sites while showing that the density 
can encompass most RNA elements. The model should not be used to interpret 
individual base locations or conformations. For the creation of models for classes 
3, 4, and the Me?*, models of the VRNA H1, VRNA H2, and tRNA were taken from 
the global model described above. These helical regions were treated as rigid bodies 
and only the connecting hinge regions (Extended Data Fig. 7c) were rebuilt using 
the protocol described above. All figures for the RTIC core and global models were 
prepared in Chimera™. 

Activity assays. For all activity assays, the RTIC, RT, and VRNA-tRNA were puri- 
fied as described above. 

Time-course assay. RTIC (200 nM) was preincubated for 20 min at 37°C in 50 mM 
Tris-HCl, pH 8.0, 50mM KCl, 2.5mM MgCh. Free VRNA-tRNA (200 nM) and 
RT (241M) were also preincubated for 20 min under the same conditions, but with 
dCTP in order to fully incorporate the first nucleotide before dTTP incorporation. 
Incorporation reactions were started by adding a mixture of «-**P-dTTP (50nM), 
and dTTP (50,1M). Reactions were quenched at a range of times from 1s to 4h 
with the addition of EDTA and SDS loading buffer. The reactions were run on an 
4-20% SDS-PAGE gel, dried, and exposed for 18h on a phosphoimager screen 
(Molecular Dynamics) and imaged with a Storm 860 (Molecular Dynamics). Bands 
were quantified using ImageQuant. Intensity was normalized to the highest band 
intensity for the individual time course assays after background subtraction (set to 1). 
All time course assays were reliably reproduced and the slow reactions required 
no special equipment’. Plotting and curve fitting was done using IgorPro. For 
NNRTI experiments, 1 {1M nevirapine was added to the pre-reaction incubation 
mixture of the RTIC. 

Relative total incorporation assay. Reactions were performed as described 
above. RTIC reaction mixtures were quenched at 1h and the free RT and VRNA- 
tRNA were quenched at 30 min. Samples were quantified as described above. 
Incorporation was normalized to the average free RT + VRNA-tRNA band inten- 
sity (set to 100%). Relative total incorporation assays were done in triplicate. 
Reverse transcriptase assay. VRNA-tRNA complexes were purified as described 
above using a tRNA that was labelled on the 5’ end with cyanine3 dye. Reactions 
were pre-incubated at 37°C for 5 min in 50 mM Tris-HCl (pH 8.0), 50mM KCl, 
6mM MgCh, and 5mM {-met at avRNA-tRNA concentration of 200 nM and RT 
concentration of 31M. Reactions were initiated by the addition of a dNTP mixture 
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that brought the final individual dNTP concentrations to 100|1M. Reactions were 
performed in triplicate and quenched at 30 min with EDTA (50 mM). Samples 
were denatured in a formamide loading buffer, heated for 5 min at 95°C, and 
loaded on an 8.5% polyacrylamide gel that was pre-run for 2h. Samples were run 
for 3h at 120 W before imaging with a Typhoon Trio (Amersham Biosciences). 
Fully extended and unextended primer bands were quantified using ImageQuant. 
Percent primer extension was calculated and normalized to wild-type RT. 
Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Cryo-EM maps of the global RTIC, the core of the RTIC, and 
the global RTIC with MgCl, have been deposited in the Electron Microscopy Data 
Bank under accession codes EMDB-7032, EMDB-7031 and EMDB-7540. The 
coordinates of the RTIC core model have been deposited in the Protein Data Bank 
under accession code 6B19. The global RTIC model is available as Supplementary 
Data. All other data are available from the corresponding author upon reasonable 
request. 


32. Gétte, M. et al. HIV-1 reverse transcriptase-associated RNase H cleaves RNA/ 
RNA in arrested complexes: implications for the mechanism by which RNase 
H discriminates between RNA/RNA and RNA/DNA. EMBO J. 14, 833-841 
(1995). 

33. Marshall, R.A., Dorywalska, M. & Puglisi, J. D. Irreversible chemical steps control 
intersubunit dynamics during translation. Proc. Nat! Acad. Sci. USA 105, 
15364-15369 (2008). 

34. Aitken, C. E., Marshall, R. A. & Puglisi, J. D. An oxygen scavenging system for 
improvement of dye stability in single-molecule fluorescence experiments. 
Biophys. J. 94, 1826-1835 (2008). 

35. Johansson, M., Chen, J., Tsai, A., Kornberg, G. & Puglisi, J. D. Sequence- 
dependent elongation dynamics on macrolide-bound ribosomes. Cell Rep. 7, 
1534-1546 (2014). 

36. O'Leary, S. E., Petrov, A., Chen, J. & Puglisi, J.D. Dynamic recognition of the 
mRNA cap by Saccharomyces cerevisiae elF4E. Structure 21, 2197-2207 
(2013). 

37. Aitken, C.E. & Puglisi, J. D. Following the intersubunit conformation of the 
ribosome during translation in real time. Nat. Struct. Mol. Biol. 17, 793-800 
(2010). 


LETTER 


. Chen, J., Tsai, A., Petrov, A. & Puglisi, J. D. Nonfluorescent quenchers to correlate 


single-molecule conformational and compositional dynamics. J. Am. Chem. Soc. 
134, 5734-5737 (2012). 


. Zheng, S. Q. et al. MotionCor2: anisotropic correction of beam-induced motion 


for improved cryo-electron microscopy. Nat. Methods 14, 331-332 (2017). 


. Rohou, A. & Grigorieff, N. CTFFIND4: Fast and accurate defocus estimation from 


electron micrographs. J. Struct. Biol. 192, 216-221 (2015). 


. Zhang, K. Gctf: Real-time CTF determination and correction. J. Struct. Biol. 193, 


1-12 (2016). 


. Scheres, S. H. RELION: implementation of a Bayesian approach to cryo-EM 


structure determination. J. Struct. Biol. 180, 519-530 (2012). 


. Scheres, S. H. Semi-automated selection of cryo-EM particles in RELION-1.3. J. 


Struct. Biol. 189, 114-122 (2015). 


. Scheres, S. H. Processing of structurally heterogeneous cryo-EM data in 


RELION. Methods Enzymol. 579, 125-157 (2016). 


. Penczek, P.A., Grassucci, R. A. & Frank, J. The ribosome at improved resolution: 


new techniques for merging and orientation refinement in 3D cryo-electron 
microscopy of biological particles. Ultramicroscopy 53, 251-270 (1994). 


. Tang, G. et al. EMAN2: an extensible image processing suite for electron 


microscopy. J. Struct. Biol. 157, 38-46 (2007). 


. Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta 


Crystallogr. D 60, 2126-2132 (2004). 


. Wriggers, W. Conventions and workflows for using Situs. Acta Crystallogr. D 68, 


344-351 (2012). 


. Adams, P. D. et al. PHENIX: a comprehensive Python-based system for 


macromolecular structure solution. Acta Crystallogr. D 66, 213-221 (2010). 


. Davis, |. W. et al. MolProbity: all-atom contacts and structure validation for 


proteins and nucleic acids. Nucleic Acids Res. 35, W375-W383 (2007). 


. Lavender, C. A., Gorelick, R. J. & Weeks, K. M. Structure-based alignment and 


consensus secondary structures for three HIV-related RNA genomes. PLoS 
Comput. Biol. 11, e1004230 (2015). 


. Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory 


research and analysis. J. Comput. Chem. 25, 1605-1612 (2004). 

sel, C., Ehresmann, C., Keith, G., Enresmann, B. & Marquet, R. Initiation of 
reverse transcription of HIV-1: secondary structure of the HIV-1 RNA/ 
tRNA(3Lys) (template/primer). J. Mol. Biol. 247, 236-250 (1995). 

sel, C. et al. Specific initiation and switch to elongation of human 
immunodeficiency virus type 1 reverse transcription require the post- 
transcriptional modifications of primer tRNA3Lys. EMBO J. 15, 917-924 (1996). 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a . bb, co, 
i. = RTIC " - 
by, 
a 8 _8 Any % 
28 ge 2 
=< oa ae 
a es ¢7 RT 
en ad oS 
8 we $e RTIC 
Bol eg ge 
27 o << 
3 ” VRNA/tRNA 
wn vRNA 
29) 10 20 30 40 50 60 707 ~o 5 10 15 20 25 
Elution Volume (CV) Elution Volume (mL) 
d Dn e f 
“tp 100 RT + vRNA/tRNA 
4 NNRTI + 
L, _— = 
Px Sy, E 
CG 5 
S 
© 
s 
50 p66-vRNA/tRNA 
p66-vRNA/tRNA 8 
5 
2 vRNA/tRNA / 
& 4 
VRNA/tRNA 
free RT + RTIC 
vRNA/tRNA 
g h 
105 ,o.- A 
pe SS 
Py 
100+ 08 48! 
_s Is 
Dore Ib | 
: 
4 
e & ' 
2 z 06-; | 
= Bon 
G Pa : 
£50 § ' 
& £04y 
& 
0.24 
0. 
WT Q258C Q258C 


£478Q 


Extended Data Fig. 1 | Purification and activity of RTIC. a, Initial anion- 
exchange purification of the RTIC away from free RT and VRNA-tRNA. 
This purification was repeated for each sample (>10) used in the 
manuscript, with only slight variations in the chromatogram. b, Polishing 
step using size-exclusion chromatography purification of the RTIC after 
anion exchange. This purification was repeated for each sample used in the 
manuscript (>10), with only slight variations in the chromatogram. c, A 
final 10% native TBE gel on the purified components. RT barely enters the 
gel under these running conditions. The RTIC runs as a single band, but 
trace amounts of free VRNA and/or VRNA-tRNA complex are sometimes 
present. This native gel is a representative result that was repeated 
independently for all purified RTIC samples used in the paper (>10). 

d, Autoradiograph image illustrating that the RTIC is capable of 
incorporating an incoming a-*’P-dTTP nucleotide when extended and 
purified using dCTP instead of ddCTP. This gel is a representative result 
that was repeated independently for crosslinked and uncrosslinked 
samples (>6 independently prepared samples) used in dTTP 
incorporation assays. e, The RTIC incorporates «-**P-dTTP at roughly 
89% efficiency compared to the free components after reaching a plateau. 
Values are mean + s.d. (n = 3 independent experiments) with 
normalization to total incorporation of free RT + VRNA-tRNA reactions. 
f, Autoradiograph image showing that the incorporation of dTTP is 
inhibited in the presence of nevirapine (NNRTI). Images have been 
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adjusted to allow identification of the NNRTI-inhibited band. This gel is a 
representative result that was repeated independently for crosslinked and 
uncrosslinked samples (3 samples each). g, Relative activities, judged by 
primer usage, of wild-type, Q258C, and Q258C/E478Q reverse 
transcriptase mutants used in this study. Values are mean + s.d. 
(n=3 independent experiments) with normalization to the primer usage 
of wild-type RT. h, RTIC (triangles), RTIC with NNRTI (circles) or 
vRNA-tRNA + excess RT (squares) reactions were initiated by addition 
of a-**P-dTTP and quenched at different time points. Data were fit 
using the relationship for the free VRNA-tRNA + RT reaction: 
Intensity = A(1 —e *pol) +. B(1—e sow!) Data were fit using the 
relationship for the RTIC (with or without NNRTI) reaction: 
Intensity = B (1—e Kstow') where A and B represent the amplitude of the fast 
and slow processes, respectively, kpoi is the apparent extension rate 
constant, and k,jow is the rate of the slow process. The second relationship 
was used for the RTIC data, as the slow process appears to dominate 
incorporation when the VRNA-tRNA substrate is crosslinked to RT. The 
best fits were obtained with: A = 0.7166 AU, kpo1= 0.1078 s~!, B=0.2754, 
kstow = 0.01002 s~' for the VRNA-tRNA + excess RT; B = 0.9808, 
kstow = 0.003140 s~! for the RTIC; and B= 1.095, kgow=0.0001714s"! for 
the RTIC with NNRTI. kgjow is about 3.19 times slower for crosslinked 
RTIC than for un-crosslinked components. Assays were independently 
repeated three times to ensure reproducibility. 
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c 
Extended Data Fig. 2 | Representative negative-stain EM images, Results are reproducible in the absence of 8-OG (>10 samples tested). 
cryo-EM images, and 2D averages of the RTIC. a, Representative c, Cryo-EM image of RTIC with B-OG. Single particles corresponding to 
negative-stain EM image of HIV RTIC reveals a mono-disperse sample the complex appear similar to the negative-stain visualization. All 5,107 
that is free of aggregates. Approximately a dozen images were taken of images used in both cryo-EM datasets have a similar appearance with 
each sample before cryo-EM grid preparation to ensure sample quality. slight differences in particle density. d, Representative 2D averages of 
b, Cryo-EM image of RTIC without 8-OG. The long chains correspond RTIC complex from the cryo-EM data collected with B-OG. Both datasets 
to RNA from the complex with very few particles resembling the protein. exhibit very similar 2D classes. 
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Extended Data Fig. 3 | Data processing workflow for RTIC complex. 
a, Data processing workflow for the 8.0 A global and 4.5 A core maps. 


b, Gold standard FSC curve of RTIC core and global maps. c, The final 
4.5 A map is coloured according to local resolution estimated by Relion. 
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e, Data processing workflow for the 8.2 A global Mg?* map. f, Gold 
standard FSC curve of RTIC Mg”* global map. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


Fingers 


Extended Data Fig. 4 | Quality of the cryo-EM density for the core 
RTIC map. a, View of HIV-1 RT from the front. The subdomains of RT 
are coloured. Underneath the main RTIC view, each subdomain of RT, 
plus the p51 subunit, is shown fit into the 4.5 A map. b, View of HIV-1 RT 
from the polymerase active site side. The subdomains of RT are coloured. 
Underneath the main RTIC view, each subdomain of RT, plus the p51 
subunit, is shown fit into the 4.5 A map. Ina, b, regions of protein, namely 
loops and linkers, that lacked sufficient density were removed after 
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ed 


RNase H 


vRNA 3’ 


+1 extended 
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comparison with previously published structures of RT. These regions 

are indicated by dotted lines and are most commonly found in the finger 
and palm subdomains. c, Representative regions of 4.5 A map fitted with 
protein secondary structure that display densities for side chains. A view of 
the PBS helix fit into the 4.5 A map is also shown; phosphates of the RNA 
backbone are partially resolved. Regions are coloured with respect to the 
main text models. 
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Extended Data Fig. 5 | Mg”* global map views and structure 
comparison. a, Side and top views of the 8.2 A global map at different 
density thresholds. The orientation of the peripheral VRNA and tRNA 
elements is within the variability seen among the different RTIC 
conformers. b, A model of the RTIC built into the Mg* density using the 
main text global RTIC model. vRNA and tRNA helices were treated as 
rigid bodies derived from main text model (see Extended Data Fig. 6 and 


Top View 


Methods). c, Comparison of the global RTIC model RNA (grey) with the 
Mg** model RNA (coloured). All three regions of RNA structure (H1, H2, 
and tRNA) differ in the Mg”* model, but are adequately described by 
rigid body movements of the RNA helical elements taken from the global 
RTIC model. Both H1 and H2 represent a substantial structural barrier 

to initiation. d, Partial accommodation of H1 into high monovalent salt 
classes 3, 4 and 7. 
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Extended Data Fig. 6 | Low-resolution tRNA density and fold independently folded. Corresponding secondary structure is in 
comparison. a, Top and side views of the elongated helical tRNA density e. d, Secondary structure depiction of the new VRNA-tRNA and canonical 
observed in the low-resolution global map of the RTIC. b, Top and side clover-leaf fold of the tRNA. The different domains are coloured and 
views of the VRNA-tRNA model generated using the hypothesized correspond with the models in panels b and c. e, Secondary-structure 


elongated tRNA helical fold. The tRNA model fits the long helical density depiction of the old VRNA-tRNA fold with independent anticodon and 


well. Corresponding secondary structure is in d. c, Top and side views 


D-stem loops. The domains are coloured and correspond with the model 


of the VRNA-tRNA model generated using previously hypothesized in c and clover-leaf fold of the tRNA in d. 
tRNA secondary structures that have the anticodon and D-stem loops 
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Extended Data Fig. 7 | Peripheral RNA heterogeneity of the RTIC 
conformers. a, Tiled views of eight conformations emerging from 3D 
classification of RTIC. Each class is numbered and class 7 was used for the 
global RTIC reconstruction. b, Superposition of the eight classes from a. 
The main areas of RNA heterogeneity are focused on the orientations of 
vRNA H2, H1 and the connection loop, and the tRNA. With no stabilizing 
protein contacts, VRNA H2, H1, and the tRNA sample a wide range of 
conformations, limiting the resolution of the global map. c, Additional 
RTIC models built into classes 3 (tan) and 4 (blue). The models for the 


vRNA H1 and 
connecting loop 


vRNA H1 and 
connecting loop 
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tRNA, vRNA H1, and vRNA H2 were all derived from the global RTIC 
model and treated as rigid bodies for model building. The connecting loop 
was not built in these models as the density for this region was not clear in 
these maps, though there is reasonable density to model a loop near H1. 
Junctions between the helices serve as hinges that allow movement of the 
independent domains. The main text global RTIC model (grey) is included 
as a comparison. d, The VRNA and tRNA helices treated as rigid bodies for 
modelling are shown in bold. Hinge points for each helix are highlighted 
with grey circles and serve as points of flexibility for the RTIC. 
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Extended Data Fig. 8 | Single-molecule experimentation and analysis. 
a, Secondary structure depiction of the VRNA-tRNA construct used for 
single-molecule experiments. The labelling scheme is shown, with the Cy3 
dye located on the 5’ end of the VRNA helix 1 and Cy5 dye located on an 
oligonucleotide positioned near the 5’ end of helix 1. The VRNA-tRNA 
complex was crosslinked to RT for the experiments. b, Ninety-five per 
cent of the RTIC complexes are in the high FRET, helix 1 formation, state 


n 
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Time (s) 
(480 traces analysed, see Methods). c, Example trace of the ones used for 
final FRET analysis. The high FRET state of the RTIC complex, which 
is attributed to helix 1 formation. Photobleaching events for both Cy5 
and Cy3 are indicated. d, Examples of traces removed from final FRET 
analysis. Traces exhibit the presence of multiple molecules (multiple 
single-dye photobleaching events) or poor dye behaviour (blinking and 
quenching). 
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RT-nucleic acid complexes in the cryo-EM map. All alignments between 
structures and the RTIC were done using the p51 subunit. a, Comparison 
of an active conformation RT-nucleic acid structure (pink, 1RTD) with 
the RTIC core (RT, purple; tRNA primer, red; VRNA template, yellow). 
The EM map overlay shows the poor fit of the 1RTD model in the fingers, 
thumb, and primer grip of RT. Deviations of the nucleic acid primer and 
template of IRTD away from the RTIC density are also apparent. 

b, Comparison of an NNRTI-bound RT-nucleic acid structure (dark grey, 
3V81) with the RTIC core. The EM map overlay shows the closer fit of 
the fingers and primer grip regions of RT in the 3V81 model. The thumb 
region also overlays well, but with slight deviations. Most noticeably, the 
nucleic acid primer/template in the 3V81 model deviates, although not as 
dramatically as in 1RTD, from the RTIC core EM density. 
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Extended Data Table 1 | Cryo-EM data collection, refinement, and validation statistics 


RTIC Core RTIC Global RTIC Global 
(EMDB-703 1) (EMDB-7032) w/MgCl, 
(PDB-6B19) (Supplementary (EMDB-7540) 


Data file for model) 


Data collection and processing 


Magnification (calibrated) 50,000 50,000 38,880 
Voltage (kV) 300 300 200 
Electron exposure (e-/A’) 60 and 85 60 and 85 715 
Defocus range (um) -1.3 to -2.5 -1.3 to -2.5 -2.0 to -3.0 
Pixel size (A) 1.0 1.0 1.286 
Symmetry imposed Cl Cl Cl 
Initial particle images (no.) 765,688 765,688 148,523 
Final particle images (no.) 128,153 21,520 67,346 
Map resolution (A) 4.5 8.0 8.2 

FSC threshold 0.143 0.143 0.143 
Refinement 
Initial model used (PDB code) 3V81 3V81 
Map sharpening B factor (A”) -250 -200 -200 
Model composition 

Non-hydrogen atoms 5,299 8,545 

Protein residues 909 962 

RNA nucleotides 38 178 
R.m.s. deviations 

Bond lengths (A) 0.023 0.003 

Bond angles (°) 1.603 0.83 
Validation 

MolProbity score 2.41 1.92 

Clashscore 15.00 14.25 

Poor rotamers (%) N/A N/A 
Ramachandran plot 

Favored (%) 80.46 96.15 

Allowed (%) 19.42 3.85 

Disallowed (%) 0.12 0.00 
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Structure of the alternative complex III ina 
supercomplex with cytochrome oxidase 
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Alternative complex III (ACIII) is a key component of the 
respiratory and/or photosynthetic electron transport chains of 
many bacteria!~3. Like complex III (also known as the bc, complex), 
ACIII catalyses the oxidation of membrane-bound quinol and 
the reduction of cytochrome c or an equivalent electron carrier. 
However, the two complexes have no structural similarity*’. 
Although ACIII has eluded structural characterization, several of 
its subunits are known to be homologous to members of the complex 
iron-sulfur molybdoenzyme (CISM) superfamily*, including the 
proton pump polysulfide reductase®!”. We isolated the ACIII 
from Flavobacterium johnsoniae with native lipids using styrene 
maleic acid copolymer!!“"4, both as an independent enzyme and 
as a functional 1:1 supercomplex with an aa3-type cytochrome 
c oxidase (cyt aa3). We determined the structure of ACIII to 3.4 A 
resolution by cryo-electron microscopy and constructed an atomic 
model for its six subunits. The structure, which contains a [3Fe-4S] 
cluster, a [4Fe—4S] cluster and six haem c units, shows that ACIII 
uses known elements from other electron transport complexes 
arranged in a previously unknown manner. Modelling of the cyt 
aa; component of the supercomplex revealed that it is structurally 
modified to facilitate association with ACIII, illustrating the 
importance of the supercomplex in this electron transport chain. 
The structure also resolves two of the subunits of ACIII that are 
anchored to the lipid bilayer with N-terminal triacylated cysteine 
residues, an important post-translational modification found in 
numerous prokaryotic membrane proteins that has not previously 
been observed structurally in a lipid bilayer. 

The ACIII-cyt aa; supercomplex from F. johnsoniae membranes 
was solubilized, purified and biochemically characterized using sty- 
rene maleic acid (SMA) copolymer nanodiscs without traditional 
detergents (Supplementary Discussion, Extended Data Figs. 1-3). The 
supercomplex catalyses the two-electron oxidation of menaquinol (or 
ubiquinol) and the four-electron reduction of oxygen to water with 
a turnover number of around 21 electrons per second without the 
addition of exogenous cyt c (Supplementary Information, Extended 
Data Fig. 3), indicating a functional electron transfer chain within the 
supercomplex. The addition of exogeneous cyt c did not increase the 
rate of electron transfer. The structure of the ACIII-cyt aa3 supercom- 
plex in SMA nanodiscs was determined by cryo-electron microscopy 
(cryo-EM) (Fig. 1, Extended Data Fig. 4). The supercomplex has a 
mass of 464 kDa (Supplementary Discussion), a transmembrane cross- 
section of approximately 9nm x 13 nm (Extended Data Fig. 5), and 
contains 48 transmembrane a-helices. To our knowledge, the ACIII- 
cyt aa3 supercomplex is the largest protein complex reported to be con- 
tained within an SMA copolymer nanodisc. The SMA copolymer and 
lipids contribute only a thin layer of density around the supercomplex 
(Fig. la, b), which is not circular but follows the contours of the protein. 


Whether this is a general feature of SMA-solubilized proteins or is due 
to the large size of the ACIII-cyt aa3 supercomplex is not known, 
and will be clarified when more structures are determined using this 
approach. The number of loosely bound, unresolved lipid molecules 
is not known, nor is it known whether they are sufficient in number to 
form a true bilayer surrounding the protein. The SMA-supercomplex 
nanodiscs retain native lipids, are more stable and have 30% higher 
specific activity than the supercomplex isolated with detergents (for 
example, dodecylmaltoside) (Supplementary Discussion, Extended 
Data Fig. 3). Because traditional detergents are avoided in generating 
SMA nanodiscs, the preparative protocol is more rapid and simpler 
than making nanodiscs using membrane scaffold proteins. 

Although the properties of the SMA nanodiscs are less well charac- 
terized than nanodiscs made with membrane scaffold proteins? 15 our 
work demonstrates the utility of SMA nanodiscs for high-resolution 
structural studies of membrane proteins. 

The resolution of the cryo-EM density map enabled construction of 
an atomic model for more than 90% of the sequences predicted from 
the ACIII gene cluster (Supplementary Discussion), including subunits 
ActA, ActB, ActC, ActD, ActE and ActF (Fig. 2, Extended Data Fig. 5, 
Extended Data Table 1). Sequence analysis shows that ACIII contains 
a unique combination of known modules from other respiratory 
complexes’ (Supplementary Discussion). The ACIII structure confirms 
this prediction and shows the structure responsible for catalysing the 
quinol:cyt c oxidoreductase activity. The ACIII structure can be divided 
into three parts: a core assembly of ActC and ActB that oxidizes quinol; 
a haem c assembly consisting of ActA and ActE that directs electrons 
from ActB to the terminal electron acceptor; and auxiliary transmem- 
brane subunits ActD and ActF with unknown functions. With some 
key differences (Extended Data Fig. 5), the overall architecture of ActB 
and ActC resembles the complex consisting of the PsrA, PsrB and PsrC 


Cyt aa, 


ACIII 


Fig. 1 | Cryo-EM of the ACIII-cyt aa; supercomplex in SMA nanodiscs. 
a, Two representative 2D class average images of the ACIII-cyt aa3 
supercomplex in a nanodisc. Calculation of 2D class averages was 

not repeated. b, Side (left) and top (right) views of the ACIII-cyt aa3 
supercomplex cryo-EM map. The transparent surface indicates the 
boundary of the nanodisc. Scale bars, 50 A. 
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subunits of polysulfide reductase from Thermus thermophilus 
(PsrABC)!®, a member of the CISM superfamily (Supplementary 
Discussion). Like PsrC, ActC contains no cofactors, but it does 
contain the proposed site for the oxidation of menaquinol. Residues 
at the menaquinol-binding site identified in PsrC’® are not conserved 
in ActC!”, Although menaquinone is not observed in the cryo-EM 
map, we propose that the ActC residues His133 and Asp164 form the 
menaquinol-binding site in ActC near the interface with ActB 
(Extended Data Fig. 6). These two residues are conserved in ActC 
sequences and there is a crevice between transmembrane helices 3 
(TM3) and 4 (TM4) of ActC that would provide access to the substrate 
in the membrane bilayer. 

The N-terminal portion of ActB is homologous to the PsrA subunit 
of polysulfide reductase, which contains the molybdenum cofactor, but 
the molybdenum cofactor is absent in ActB*. The C-terminal domain 
of ActB is homologous to PsrB, and both ActB and PsrB contain iron- 
sulfur clusters. Like PsrB, ActB from F johnsoniae is expected to contain 
four iron-sulfur clusters, but only two are observed in the cryo-EM 
map (Extended Data Fig. 7). There is one [3Fe-4S] cluster near the 
interface with ActC, about 10 A from the proposed site of menaquinol 
oxidation, and one [4Fe-4S] cluster about 9 A further away. There are 
two additional cysteine clusters present in the structure of ActB, but 
the cryo-EM map does not show iron-sulfur clusters at these locations. 
Instead, we observe disulfide bonds (Cys965-Cys938 and Cys971- 
Cys769) within these two cysteine clusters in ActB. The substitution of 
proposed [4Fe-4S] clusters by disulfide bonds may be a genuine aspect 
of the structure or may result from oxidation that occurred during 
sample preparation. However, if these two ‘missing’ [4Fe-4S] clusters 
were present, they would form a dead-end for electron transfer from 
the [3Fe-4S] cluster of ActB, suggesting that their absence from the 
structure is not an artefact. 

The [3Fe-4S] cluster in ActB is the most probable initial oxidant of 
menaquinol bound to ActC, and is 12.3 A from the nearest haem c in 
ActA. The five haems c in ActA plus the single haem c in ActE form 
a probable electron transfer wire from the [3Fe-4S] cluster in ActB, 
with the largest edge-to-edge distance of 9.2 A between adjacent haems 
(Fig. 2b). The [4Fe-4S] cluster in ActB appears to be off-pathway and 
its function remains to be determined. 

In all Flavobacteria, including FE. johnsoniae, ActA is predicted 
to have a monohaem domain at the N terminus in addition to the 
pentahaem domain at the C terminus (Supplementary Discussion). Mass 
spectrometry analysis shows that the N-terminal monohaem domain 
is present in the preparation (Extended Data Fig. 1), but no density 
can be assigned to this entire domain. The inability to resolve the mono- 
haem domain may result from flexibility of the domain. Full-atom 
molecular dynamics simulations were performed for the entire struc- 
ture of ACII embedded in a phospholipid bilayer to determine the 
stability and dynamics of the structure (Extended Data Fig. 8). Notably, 
the pentahaem domain of ActA had the largest root-mean-square devi- 
ation (1.m.s.d.), which arises mainly from the transmembrane a-helix 
connected to the missing monohaem domain; this is consistent with the 
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Fig. 2 | Atomic model of ACIII. a, The 

overall structure is shown on the left, along 

with separate views of individual subunits. 

Covalently bonded cysteines are shown along 
5 with the core iron-sulfur clusters (circled in 
_ orange). Both covalently linked cysteines and 
axial-coordinating residues are shown for 
haems (circled in pink). b, The edge-to-edge 
distances between cofactors. 


Ye 


monohaem domain being unobservable owing to a variable position in 
the complex. Although ActE also had a substantial r.m.s.d., it did not 
appear to correlate with disorder in the cryo-EM map. 

ActD and ActF are transmembrane subunits without bound cofactors, 
and both interact with ActC. It has not been established whether ACIII 
generates a proton motive force coupled to electron transport!®. The 
absence of redox centres in ActC, ActD and ActF suggests that if ACIII 
contributes to the transmembrane proton gradient, it does not use 
the bifurcation-type Q-cycle mechanism of canonical complex III’, 
but instead functions as a true proton pump with a mechanism that 
resembles that of complex I?°. ActD has two transmembrane «a-helices 
that cross within the membrane and are adjacent to ActC. Both N 
and C termini are within the cytoplasm and combine to form a single 
globular domain that rests on the cytoplasmic surface of ActC. The 
ten transmembrane a-helices of ActF form a pseudo two-fold axis of 
symmetry with the ten transmembrane a-helices of ActC (Extended 
Data Fig. 5), despite the fact that ActF has less than 20% sequence 
identity with ActC. If ACIII isa proton pump, it is likely that conserved 
polar residues within the bilayer will have important roles. 

The structure of ACIII reveals eleven ordered phospholipid mole- 
cules as well as triacylated cysteine residues at the N termini of ActB 
(Fig. 3a) and ActE (Extended Data Fig. 7). The anchoring of bacterial 
membrane proteins by an N-terminal triacylated cysteine is a well- 
characterized phenomenon”’; however, to our knowledge, this is the 
first time the structure of a triacylated cysteine residue has been deter- 
mined in the context of a membrane protein. Both lipid anchors are 
tilted with respect to the plane of the lipid bilayer (Fig. 2a), restricting 
the ability of other lipids to pack around them. This feature could alter 
the mechanical properties of the adjacent portion of the membrane 
bilayer, and also guide conformational changes in the ACII protein. 
Notably, the two N-terminal lipid anchors are adjacent to each other 
in the membrane. These lipid anchors probably help ACII to assemble 
and keep the monohaem ActE bound to the complex. The eleven lipids 
that are resolved adjacent to the transmembrane a-helices accommo- 
date the rugged protein surface of the complex (Fig. 3b, Extended Data 
Fig. 7). The head groups of the lipids could not be identified and were 
all modelled as phosphatidylethanolamine. There are two ‘hot spots’ 
for resolved lipids: the cytoplasmic interface between ActC and ActF; 
and the vicinity of the triacylated cysteine of ActB, which is near the 
proposed entry point for menaquinol into the complex. All eleven of 
the resolved lipids remained bound to the protein throughout 250 ns 
of molecular dynamics simulation (Extended Data Fig. 8), supporting 
the ability of SMA nanodiscs to preserve some native lipid-protein 
interactions and suggesting a functional role for the lipids. A large 
number of annular lipids, including those modelled in the structure, 
were observed to associate with the protein from the in silico bilayer. 

Frequently, the subunits encoding ACIII are within an operon 
that includes subunits of an associated complex IV? (cyt aa3 or cyt 
caa3). We find that the sequences of subunit III from complex IVs that 
are associated with ACIII have unusual features that distinguish 
them from the canonical subunit III (Supplementary Discussion). 
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Fig. 3 | Lipids in the structure of ACIII. a, Triacylated cysteine at the 

N terminus of ActB. The triacylated cysteine and its downstream ten 
amino acid residues are shown in the context of the experimental density 
map. b, Other resolved lipids near the transmembrane a-helices. Four 
lipid molecules are resolved at the cytoplasmic interface between ActC 
and ActF. Alongside, two lipid molecules are clustered near the triacylated 
cysteine from ActB, directly above the proposed quinone entry pathway. 


Whereas subunit III of complex IV generally contains seven transmem- 
brane a-helices, those that are associated with ACIII lack TM1 and 
TM2 (Fig. 4a). Although only parts of subunit III of cyt aa3 are resolved 
to better than 4A, the density for cyt aa; has sufficient resolution to 
identify five a-helices from the structure. A homology model of subunit 
III from F. johnsoniae cyt aa; was built on the basis of the structure of 
TM3 to TM7 of subunit III from Rhodobacter sphaeroides cyt aa3, and 
fit into the ACIII-cyt aa3 supercomplex density map (Extended Data 
Fig. 9) with high fidelity. The deletion of the first two transmembrane 
a-helices in subunit IT] of cyt aa3 appears to be a necessary adaptation 
to enable formation of the supercomplex with ACIIL It is notable that 
the same two helices in subunit III are also absent in the cyt aa; obliga- 
tory cyt bcc—cyt aa3 supercomplex found in Actinobacteria (for exam- 
ple, Corynebacterium glutamicum and Mycobacterium tuberculosis)”. 

The sequence analysis also reveals that the loop between TM5 and 
TM6 of subunit III in the cyt aa; that is part of the supercomplex 
is much longer in F johnsoniae (and all Flavobacteria) than in 
other organisms. Typically, this loop contains eight residues, but in 
E johnsoniae it contains 121 residues (Fig. 4a). Part of this long loop fits 
in a groove between ActB and ActD of ACIII on the periplasmic side of 
the membrane (Extended Data Fig. 9). The structural model reveals a 
m-cation interaction between Trp188 of subunit III and Arg868 of ActB 
(Fig. 4b), both of which are conserved among organisms containing 
subunit III with a long loop between TM5 and TM6 (Extended Data 
Fig. 9). This specific and strong interaction stabilizes the ACII-cyt 
aa; supercomplex and appears to be a second adaptation that enables 
the formation of a supercomplex with ACIII. The contact between the 
periplasmic loop of subunit III of cyt aa3 and ACIII is the only observed 
direct contact between the two complexes. The five well-resolved trans- 
membrane a-helices of subunit II of cyt aa3 are angled away from 
ACIII with only the tip of TM6 of subunit III touching ActF, forming 
a wedge-like space between the membrane domains of ACIII and cyt 
aa3. The decrease of resolution in the portions of cyt aa; that are distant 
from the interface with ACIII suggests that there may be several con- 
formations of the supercomplex that are all tethered by the loop in cyt 
aa3. The loop could, therefore, serve as a hinge, enabling the membrane 
domains of ACIII and cyt aa3 to swing into contact transiently. 

Using the location of TM3 to TM7 of subunit III within the super- 
complex as a guide enables a model of the entire cyt aa; to be placed 
within the density map for the supercomplex (Extended Data Fig. 9). 
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Fig. 4 | ACIII-cyt aa; supercomplex in F. johnsoniae. a, Topology 
comparison of subunit III of cyt aa3 from different species. b, Tryptophan 
188 from the subunit III loop (orange) of cyt aa3 interacts with arginine 
868 from the subunit B of ACIII. The surface representation of ACIII on 
the right shows the binding pocket for the subunit III loop. c, Working 
model for electron transfer in the ACIII-cyt aa3 supercomplex. 


In the resulting model, there is a considerable distance (56 A) between 
the haem c in ActE of ACIII and Cug within subunit II of cyt aa3. 
Electron transfer within the supercomplex does not require the addi- 
tion of exogenous cyt c, which is also the case for the cyt bec-cyt aa3 
supercomplex from C. glutamicum’. It is possible, although it seems 
unlikely, that there is a subset of conformations in which ActE comes 
close enough to cyt aa; for direct electron transfer. It is noteworthy 
that the monohaem domain of ActA has substantial sequence homol- 
ogy (around 30% identity) with the haem c domain that is present at 
the C terminus of subunit II of cyt caa3 from T. thermophilus. This 
observation suggests that the ActA monohaem domain, which we 
postulate to be highly mobile in the structure (see above), may be 
able to interact with subunit I] of cyt aa3 and shuttle electrons from 
the ActE monohaem domain to subunit II of cyt aa3. As such, electron 
transfer within the supercomplex may require the monohaem domain 
of ACIII to swing back and forth between ACIII and cyt aa; to shuttle 
electrons (Fig. 4c). Additional experimental work will be required to 
test this model and, indeed, to determine the physiological advantage 
of forming the supercomplex. 

We would like to note that, contemporaneously with our studies, 
Sousa et al.”* determined the structure of the homologous ACIII from 
Rhodothermus marinus by cryo-EM at 3.9 A resolution. Aside from 
species-specific variations, the reported structures of the ACILs of 
F. johnsoniae and R. marinus are compatible. The observation made 
here that, like canonical complexes III and IV, ACIII and an adapted 
complex IV can also form a supercomplex hints at the importance of 
supercomplexes in oxidative phosphorylation. We demonstrate that 
high-resolution cryo-EM with SMA nanodiscs, which preserves native 
protein-protein and protein-lipid interactions, is ideally suited to have 
an important role in future studies. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0061-y. 
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METHODS 

Bacterial strain and growth conditions. Flavobacterium johnsoniae ATCC 17061 
strain UW101 was used in this study. The strain was a gift from M. McBride at the 
University of Wisconsin, Milwaukee. The cells were grown in casitone-yeast extract 
medium at 30°C under high aerobic conditions (500 ml cultures in 2 1 flasks)*>. 
Membrane preparation and protein purification. Cells grown overnight were 
collected by centrifugation (14,000g for 10 min). The cell pellet from 12 1 of culture 
(+2.5 g 11) was resuspended in ~200 ml of 20mM Tris-HCl buffer, pH 8 (buffer 
A) with 5 mM MgSO,, DNase I (Sigma) and a protease inhibitor cocktail (Sigma). 
This suspension was passed three times through a Microfluidizer at a pressure 
of 80,000 psi to disrupt the cells. The cell extract was centrifuged at 14,000g for 
10min to remove unbroken cells. Membranes were obtained after centrifugation 
at 185,500g for 4h. Under the above growth conditions, the membranes contained 
ACIII, cyt aa3 and cyt bd. The membrane pellet was solubilized by using either a 
traditional detergent or the SMA copolymer. 

Purification using Triton X-100 and DDM. The membrane pellet was resus- 
pended in buffer A (~50mg ml’) along with 300mM NaCl, and solubilized by 
the addition of Triton X-100 (Fisher Scientific) to a final concentration of 4%. The 
solution was incubated at 4°C for 2h with mild agitation. The suspension was 
cleared by centrifugation at 185,500g for 1h, after which the detergent was diluted 
fourfold by adding three volumes of buffer A to the supernatant. The diluted super- 
natant was then added to a chromatography column containing 10 ml of Ni-NTA 
resin (Qiagen) pre-equilibrated with 20 mM Tris-HCl pH 8 containing 0.05% 
Triton X-100 and 0.15 M NaCl (buffer B). The resin was washed with about ten 
column volumes of buffer B to remove any unbound sample. Detergent exchange to 
n-dodecyl-8-p-maltoside (DDM; Anatrace) was carried out by washing with buffer 
B containing 0.05% DDM instead of Triton X-100 (buffer C). The column was 
further washed with five column volumes of buffer C containing 10 mM imidazole 
to remove the loosely bound proteins from the resin. The proteins that were well 
bound to the resin were eluted using 100 mM imidazole in buffer C. The eluent 
was concentrated to around 3 ml using Amicon Ultra-15 filters (Millipore) with 
a 100-kDa cutoff. The excess imidazole was removed by dialysis against buffer C. 
The yield of protein obtained was about 0.3 mg1~! of ACIII and 0.16 mg 1“! of cyt 
aa; from 121 of culture. When indicated, the proteins were further purified by gel 
filtration chromatography using a Superdex 200 10/300 GL column (GE Healthcare 
Life Sciences). The purified proteins were stored at —80°C after adding glycerol to 
a final concentration of 10%. 

Purification using SMA copolymer. The SMA copolymer SMA 3000HNA (styrene 
maleic acid copolymer, ~3:1 molar ratio of styrene:maleic acid) was a gift from 
T. Bricker (Louisiana State University) who used SMA copolymer made by Cray 
Valley USA (now Total Petrochemicals & Refining USA) successfully for the 
studies of photosystem from spinach thylakoids”®. Additional SMA 3000HNA 
was provided by Total Petrochemicals & Refining USA as an aqueous solution of 
25.6% (w/v) SMA. We also used a similar product, Xiran SL25010 S25, provided 
by Polyscope Polymers B. V., with similar results. These polymer preparations are 
provided as aqueous solutions of the sodium salt, and the polymer solutions were 
simply diluted to the final desired percentage to use directly for the solubilization 
of membranes. The purification protocol with the SMA copolymer was similar to 
that described with detergents with the following differences. After the membrane 
pellet was resuspended, the SMA solution was added dropwise to a final concentra- 
tion of 1% with continuous stirring. After incubation for 1 h at room temperature, 
the solution was centrifuged at 185,500g for 1 h to remove unsolubilized particles. 
The supernatant was loaded directly to the Ni-NTA column equilibrated with 
20mM Tris-HCl pH 8, 0.15 M NaCl. The remaining steps of the purification were 
as described above. After solubilization of the membrane suspension with 1% SMA 
3000HNA, no additional SMA or detergents were added and were not needed to 
maintain the solubilized proteins in solution. The yield of protein after the use of 
the SMA copolymer for solubilization was about 0.5 mg 17! for ACIII and about 
0.15 mg 1"! for cyt aa3 from 121 of culture. 

Analytical methods. The total protein concentration was determined using the 
BCA kit (Thermo Scientific, Pierce Protein Research Products). The UV-visible 
absorption spectra of the oxidized and reduced proteins were recorded on an 
Agilent Technologies spectrophotometer (model 8453). The pyridine haemo- 
chrome assay~” was used to determine the concentration of haems present in 
the protein samples. The total haem c concentration was divided by seven to 
calculate the ACIII concentration and the total haem a concentration was 
divided by two to calculate the cyt aa3 concentration. The purified protein 
was analysed by SDS-PAGE using 4-20% precast gels (Nusep Tech). Haem 
staining was carried out using 3,3’,5,5/-tetramethyl benzidene (TMBZ)*®. The 
supercomplex was visualized by blue native PAGE (BN-PAGE) using a 4-16% 
gel (Novex, Life Technologies) with Bis-Tris buffer. The entire gel was stained 
with Coomassie blue, and then fixed with 30% methanol and 10% glacial 
acetic acid. The gel was destained with 8% glacial acetic acid to visualize the 
bands. Peptide mass spectrometry and analyses were carried out by P. Yau at 
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the Roy J. Carver Biotechnology Center at the University of Illinois at Urbana- 
Champaign. 

Oxygen consumption assay. Oxygen consumption was measured using a Clark 
electrode (Strathkelvin) in a 1 ml chamber at 25°C as previously described”. The 
reaction mix consisted of 100 tM ubiquinone-1 (Q); Sigma-Aldrich) and 5mM 
dithiothreitol in air-saturated 0.1 M potassium phosphate buffer, pH 7.5 with 
150mM NaCl. The reaction was started by adding the purified protein into the 
chamber. The initial concentration of oxygen was calculated to be 237 .M. 
Quinol:cytochrome c oxidoreductase activity. The quinol:cytochrome c oxidore- 
ductase activity of the ACIII was measured spectrophotometrically as described 
previously*°. The reaction was carried out in a 2 ml anaerobic cuvette, at 25°C 
in 50mM potassium phosphate buffer, pH 7.5 in the presence of 50 ,tM horse 
heart cytochrome c (Sigma-Aldrich) and 200 |tM KCN. Ubiquinol-1 (Q;H2) or 
reduced vitamin K, (Sigma-Aldrich) were used as quinol substrates and, in each 
case, the quinone was reduced using sodium borohydride according to a previ- 
ously described method*!. The reaction was started by the addition of 100 1M of 
reduced quinone. 

EPR spectroscopy. The purified ACIII-cyt aa3 supercomplex was extensively dia- 
lysed against 20 mM Tris-HCl buffer, pH 8, with 150mM NaCl and 1mM EDTA 
to eliminate adventitious transition metal ions. The sample was concentrated in 
an Amicon filter to 150 jl with a final ACIII concentration of around 60 1M. 
The air-oxidized sample was directly transferred to an X-band EPR tube and sub- 
sequently frozen in liquid nitrogen. The sample was oxidized completely by the 
addition of 2mM potassium ferricyanide. Glycerol (5%) was present in all EPR 
samples. Continuous wave EPR measurements were carried out on an X-band 
Varian EPR-E122 spectrometer at the Electron Paramagnetic Resonance facility 
at the University of Illinois at Urbana-Champaign. Cryogenic conditions below 
77 K were achieved with a Lakeshore 331 temperature controller using a regulated 
flow of helium gas. 

Metal analysis. Metal analysis was carried out using inductively coupled plasma 
mass spectrometry (ICP-MS) as previously described**"?. 

Optical redox titration. Full spectrum UV-visible redox titrations were performed 
to determine the midpoint potentials (E,,) of the redox-active cytochromes in the 
DDM-solubilized ACIII-cyt aa3 supercomplex***°. The purified supercomplex 
was suspended in 4 ml of 50 mM potassium phosphate buffer pH 7.0 to a concen- 
tration of 3 1M with 251M each of the following redox mediators: benzyl viologen 
(Em,7 = —350 mV), anthraquinone-2-sulfonate (En,7 = —225 mV), 
2-hydroxy-1,4-naphthoquinone (Em,7 = —220 mV), 9,10-anthroquinone-2,6- 
disulfonate (Eyn,7 = —185 mV), duroquinone (Em7=5mV), N-ethylphenazonium 
ethosulfate (En,,7=65 mV), N-methylphenazonium methosulfate (Em,7 = 85 mV), 
diaminodurene (Em,7 = 275 mV), 2,6-dimethyl benzoquinone (Em,7 = 180mV), 
1,2-napthoquinone (Em,7 = 143 mV), 1,4-napthoquinone (E,,,7 = 36 mV) and 
potassium ferricyanide (Em,7 = 435 mV)*°.Titrations were performed with an 
anaerobic stirred cuvette and the solution potential was adjusted by injecting ali- 
quots of 10mM sodium dithionite or potassium ferricyanide as reductant and 
oxidant, respectively. Spectra were taken at approximately 10-20-mV increments 
over the titration range indicated. Spectroscopic changes of the «-bands of the 
haems upon reduction or oxidation were monitored at the peak maxima to deter- 
mine the midpoint potentials of each class of haem centre. The datasets were ana- 
lysed using Origin (Origin Laboratory Corporation) to determine spectral 
components and fit titration curves using the Nernst equation®*. 

Electron microscopy sample preparation. Holey carbon film-coated electron 
microscopy grids were nanofabricated with regular arrays of 500- to 800-nm 
holes*’ and coated with an additional layer of gold. Cryo-EM specimens were 
prepared with a FEI Vitrobot grid preparation robot at 4°C and 100% humidity by 
applying 3 1] of sample (3 mg ml) to glow-discharged grids, allowing the grids to 
equilibrate for 1s, and blotting for 12s before freezing in a liquid ethane:propane 
mixture (1:1 v/v)*®. Grids were subsequently stored in liquid nitrogen before ship- 
ping to the New York Structural Biology Center for imaging with a FEI Titan Krios 
electron microscope equipped with a Gatan K2 Summit camera and automated 
with Leginon®. 

Electron microscopy data acquisition. Movies were acquired in electron count- 
ing mode with a pixel size of 1.1 A, an exposure rate of 7.4 electrons per pixel per 
second, and a total exposure time of 10s divided in 40 frames (418 movies) or 
50 frames (1,599 movies). Frame alignment and exposure weighting were 
performed with Motioncor2. After screening averages from the aligned movies, 
475 movies were discarded because of excessive movement, low defocus, high 
defocus, or overfocus. Contrast transfer function parameters were estimated from 
the exposure-weighted averages of movie frames with CTFFIND4". 

Image processing. Particle images (3,044) were manually selected and subjected to 
2D classification with Relion 1.4”. The resulting 2D classes were used as templates 
for automatic selection of 899,405 particle images‘*. The number of particle images 
was reduced to 693,416 by further 2D classification. Subsequent image processing 
was carried out in cryoSPARC“. An initial map of ACIII-cyt aa3 was obtained 
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by ab initio 3D classification, refined to 4.1 A resolution, and used as a reference 
for the multi-refine procedure in cryoSPARC producing initial maps of the ACIII 
and the ACIII-cyt aa3. Particle images (164,239) were used to refine the ACIII-cyt 
aa; map to 3.4A resolution, but this map showed the cyt aa3 portion of the complex 
with lower density than the ACIII part. Maps with uniform density for ACIII-cyt 
aa; and ACIII, both at 3.6 A resolution, were calculated from 81,530 and 51,547 
particle images, respectively. 

Model building. The 3.4 A resolution density map was used for the de novo model 
building of ACIII. The density map was first segmented with UCSF Chimera* to 
facilitate the identification of subunits. The connectivity of each segmented map 
was further examined and the result was compared with topology predictions from 
topocons“ and secondary structure prediction from Jpred“’ to validate the subunit 
assignment and identify the directionality of peptide chain. With this information, 
model building was carried out manually in Coot**. Individual chains were first 
traced in Ca baton mode. Readily interpretable features from the density map, 
including regions rich in bulky residues, triacylated cysteines, and axial ligands 
of haem c, were used to register the structure to the sequence. Stretches of ~20 
amino acids were built progressively around these registration points and assem- 
bled as a single chain in Coot. All six subunits of ACIII were combined and refined 
with phenix.real_space_refine”’. For cofactors, the starting models were taken 
from the CCP4 ligand library directly. Cofactors were docked to the density map 
with Coot and merged with the apo protein structure. The complete structure 
was then refined with phenix.real_space_refine with geometric constraints for the 
protein-cofactor coordination. The final model was further examined in Coot to 
remove amino acid side chains with ambiguous orientations and further validated 
with MolProbity® and EMringer*!. All identified lipids with two acyl tails were 
modelled as phosphatidylethanolamine with palmitoy] tail. The conformation 
of phosphatidylethanolamine was refined with interactive molecular dynamics 
flexible fitting (i MDFF) in the presence of the protein structure using VMD™. 
Lipid tails were then truncated according to the density map. 

The 3.6 A resolution ACIII-cyt aa; density map was used for the model building 
of cyt aa3. Part of the subunit III loop region was manually built in Coot. Homology 
models for individual subunits were generated with the RaptorX server’ and 
docked into the density map with UCSF Chimera. The model for the ACIII-cyt 
aa3 supercomplex was assembled by fitting the ACIII structure to the ACIII-cyt 
aa; map and placing the cyt aa3 structure from Rba. sphaeroides (PDB 1M56) into 
the map based on the position of cyt aa3 subunit III. 

Bioinformatic analysis. Homologous protein sequences were retrieved using 
the NCBI blastp server™. The blastp results were analysed in python 2.7 with 
pandas and biopython modules. Sequence hits were filtered on the basis of cov- 
erage and sequence identity. Representative sequences were selected on the basis 
of sequence identity to maintain the variations in sequence and aligned using the 
Clustal Omega server*. Figures for sequence alignment were prepared using the 
ESPript 3.0 server™. 

Simulation system preparation. The initial ACIII structure for the MD simu- 
lation was obtained from the refined structure determined by cryo-EM. Eleven 
1-palmitoyl-2-oleoyl-sn-glycero-3-phosphoethanolamine (POPE) lipids resolved 
by cryo-EM were added to the ACIII system, which was subsequently embedded 
in a POPE membrane bilayer, solvated with the TIP3P water model®’, and ionized 
with 150mM NaCl. 

ReMDFF simulation. Resolution-exchange molecular dynamics flexible fit- 
ting (ReEMDFF)** was used for structure refinement, with the CHARMM36m 
force field for proteins’ and the CHARMM36 force field for lipids. Force-field 
parameters for haems and iron-sulfur clusters came from previous studies®!-®. 
The fitting was performed in vacuum in the presence of a grid potential derived 
from the experimental density map (coupling factor 0.3). Secondary structure 
restraints, cis-peptide bond restraints, and chirality restraints were applied to 
the protein. Haems and iron-sulfur clusters were harmonically restrained 
(k=50 kcal mol"! A~?). A Langevin thermostat™ was used for maintaining the 
average temperature at 80 K. The MD integration time step was 1 fs. A cut-off 
radius for nonbonded interactions was set to 10 A with a switching function taking 
effect at 9 A. A total of six replicas were used together with six grid potentials of 
decreasing resolution. Each was first energy-minimized for 2,000 steps and then 
equilibrated for 1 ps. Finally, 2,000 replica exchanges were attempted with 1 ps 
between attempts. 

Molecular dynamics simulation. The ACIII systems were simulated with NAMD 
2.12 using the same force-field parameters as in REMDFF. The system was energy- 
minimized for 3,000 steps using the conjugated gradient algorithm® with linear 
searching®, and equilibrated for 0.5 ns to relax lipid tail group atoms while keeping 
the lipid phosphorus atoms and protein (including haems and iron-sulfur 
clusters) heavy atoms harmonically restrained (k= 1 kcal mol“! A-?). This procedure 
was followed by a 10-ns simulation to allow lipids to relax around the proteins 
while keeping the protein backbone and heavy atoms from iron-sulfur clusters and 
haems harmonically restrained (k = 1 kcal mol“! A~). Restraints were gradually 


released over the next 5ns and the simulation continued without any biasing 
potential for a total of 250 ns. The angles in the iron-sulfur clusters were harmon- 
ically restrained to their initial values (k =300kcal mol”! deg~') throughout the 
simulation. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. All relevant data are included in the manuscript or 
Supplementary Information and/or are available from the corresponding authors 
upon reasonable request. Three cryo-EM maps mentioned in this work have 
been deposited in the Electron Microscopy Data Bank (EMDB) under accession 
codes EMD-7286 (combined), EMD-7447 (ACIII-cyt aa3), EMD-7448 (ACIII). 
The coordinates of the atomic model of the alternative complex III built from 
EMD-7286 have been deposited in the Protein Data Bank (PDB) under accession 
code 6BTM. 
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Extended Data Fig. 1 | Expression and spectroscopic characterization 
of the ACIII-cyt aa; supercomplex. a, A schematic of the respiratory 
chain of F. johnsoniae. b, UV-visible spectrum and SDS-PAGE of the 
membranes from F. johnsoniae. Left, the difference spectrum of the 
membranes of F. johnsoniae, obtained from the spectrum of the air- 
oxidized membranes and the spectrum after reduction with dithionite. 
The wavelengths associated with the haem peaks are 605 nm, 560 nm, 
552 nm and a broad peak at 630 nm for haems a, Bb, c and d, respectively. 
Right, the SDS-PAGE with the membranes followed by staining the gel 
for haems shows bands corresponding to the cytochrome subunits ActA 
(48 kDa) and ActE (20 kDa) of ACIII but no bands corresponding to 

the cytochrome subunit (around 35 kDa) from the cbb; oxidase. c, The 
gene arrangement for the ACIII and the cytochrome oxidase aa; genes 
in the F. johnsoniae genome. The genes for the subunits I and II from 
cyt aa3 oxidase are found immediately downstream of those for the act 
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Wavelength (nm) 


500 520 540 560 580 600 620 
Wavelength (nm) 


600 700 
genes of the ACIII. Two different versions of subunit III are denoted as vI 
and vII. d, UV-visible spectra of the reduced and oxidized forms of the 
supercomplex in detergent and SMA nanodiscs. The dithionite reduced 
form of the samples is represented in red and shows the peaks for haem c 
at 524nm and 552 nm and those for haem a at 443 nm and 605 nm. 

e, Pyridine haemochrome assay of the ACIII-cyt aa3 supercomplex 

in SMA nanodiscs. Plotted is the reduced-minus-oxidized difference 
spectrum of the pyridine haemochromes of the sample. Peaks at 520nm 
and 550 nm are associated with haem c and the peak at 590 nm is 
associated with haem a. Quantification from the spectrum shows a ratio 
of 10.6:1 between haem c and haem a, which translates into a 3:2 ratio 
between ACIII and cyt aa3 assuming 7 haem c per ACIII and 2 haem a per 
cyt aa3. Data in b are representative of two independent experiments with 
similar results, and data in d and e are representative of six independent 
experiments with similar results. 
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Extended Data Fig. 2 | Component and size analysis of ACIII-cyt aa3 
supercomplex. a, SDS-PAGE of the detergent-solubilized preparation 
followed by Coomassie staining (left) and haem staining (right). 

b, SDS-PAGE of the SMA nanodiscs preparation followed by Coomassie 
staining (left) and haem staining (right). c, Mass spectrometry results 

for the ACIII-cyt aa3 supercomplex preparations. d, Size-exclusion 
chromatography with the ACIII-cyt aa3 supercomplex from F. johnsoniae. 
Top left, the chromatogram of the detergent-solubilized sample, showing 
traces for protein at 280 nm, haem c at 412 nm and haem a at 443 nm 
respectively. Top right, the chromatogram of the sample isolated using 
the SMA copolymer, showing traces for protein at 280 nm, haem c at 
410nm and haem a at 605 nm. I and IJ are the two peaks corresponding to 


two populations of the supercomplex. Bottom left, chromatogram of the 
fraction containing peak I. Bottom right, chromatogram of the fraction 
containing peak II. e, BN-PAGE of the ACIII-cyt aa3 supercomplex. Left, 
the detergent-solubilized ACIII-cyt aa3 supercomplex, showing a band 
at around 500 kDa, a smear of possible aggregates and possibly ACII by 
itself. Right, the supercomplex in SMA nanodiscs, showing two different 
populations. f, BN-PAGE with the two different populations of ACIII- 
cyt aa3 supercomplex in SMA nanodiscs purified from size-exclusion 
chromatography. The two chromatographic peaks correspond to the two 
bands observed in the BN-PAGE. Data in a, b are representative of six 
independent experiments and those in d-f are representative of three 
independent experiments with similar results. 
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Extended Data Fig. 3 | Functional assays of the ACIII-cyt aa; 
supercomplex. a, The EPR spectrum of the air-oxidized sample showing 
peaks of the [3Fe-4S]* cluster from ACIII, the Cu, from the cyt aa; 
oxidase and low-spin haems with overlapping g values. Insert is a zoomed 
view from 3,000 G to 3,500 G to better visualize the peaks from Cua (black 
arrows) and the [3Fe—4S]* cluster. The region between 4,000 G and 5,000 G 
is magnified ten times to show the broad g, trough of low-spin haems. The 
measurement condition is 10 K, 9.267 GHz, 2mW microwave power and 
20 Gauss modulation. b, The EPR spectra of the ferricyanide-oxidized 
sample at various temperatures. The measurement condition is 9.257 GHz, 
2mW microwave power and 5 Gauss modulation. c, The EPR spectrum of 
the air-oxidized sample showing peaks of iron-sulfur clusters from ACIII 


and low-spin haems. The measurement condition is 10 K, 9.427 GHz, 
2mW microwave power, 10 Gauss modulation. d, The EPR spectra 

of the air-oxidized sample at various temperatures. The measurement 
condition is 9.427 GHz, 2mW microwave power, 5 Gauss modulation. 

e, Redox titration of the haems in the ACIII and the cyt aa; oxidase in 
supercomplex in DDM. The potentiometric titration of the c haems from 
the ACIII (top) and the a haems from the cyt aa3 oxidase (bottom). The 
Em values are indicated and the solid red line represents the Nernst fitting. 
f, Steady-state activity of the ACIII-cyt aa3 preparations. The number of 
independent experiments is six for ACII in DDM and SMA nanodiscs, 
and three for peak I and peak II. Data are means + s.d. Data in a-e are 
representative of three independent experiments with similar results. 
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Extended Data Fig. 4 | Single-particle cryo-EM of the ACIII-cyt aa; ACIII-cyt aa3 map, ACII map and combined map. d, Surface rendering 
supercomplex in SMA nanodiscs. a, Sum of an aligned movie of the maps coloured according to local resolution. Scale bar, 5 nm. e, Euler angle 
ACIII-cyt aa3 supercomplex in an SMA nanodisc. Scale bar, 20 nm. distributions of particles included in the calculation of the three final 
b, Two-dimensional class averages. Scale bar, 10 nm. ¢, Fourier shell maps. Data collection and structure calculation were not repeated. 
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Extended Data Fig. 5 | Features observed in the cryo-EM density and 
the de novo structure of ACIII. a, Surface representations of ACIII, 

cyt aa3 and the ACIII-cyt aa3 supercomplex. The density threshold is 
the same for ACII and cyt aa3. b, Different views of the ACIII density, 
coloured by subunit. c, Two single-span transmembrane peptides of 
unknown origin and sequence, denoted ActX and ActY, are present in 
the structure in the vicinity of ActC. These have each been modelled as a 
polyalanine peptide. d, a-helices 2-10 of ActC form two four-helical 
up-and-down bundles, coloured in two different shades of blue. 


supercomplex 


do° 
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ActB/ ActC. 


a-helices 1 and 10 are coloured grey and unlabelled. e, ActB, shown 

in cartoon form, has contact with ActA, ActC, ActD, ActE and ActF. 
Surfaces are drawn from residues that are within 4 A of ActB and coloured 
according to their chain. f, The transmembrane a-helices of ActC and 
ActF are arranged in a pseudo two-fold rotation symmetry. g, Side-by-side 
comparison of the polysulfide reductase (PDB 2VPZ) and the assembly 

of ActB and ActC. These two structures are aligned based on PsrB, the 
domain containing four iron-sulfur clusters. 
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Extended Data Fig. 6 | The proposed quinone pocket in ActC. 

a, Sequence alignment of the ActC from F. johnsoniae, R. marinus, and 
Chloroflexus aurantiacus. The transmembrane «a-helices are labelled based 
on the structure of ACIII from F. johnsoniae. The black arrows point to 
conserved polar residues that are within 15 A of the [3Fe-4S] cluster in 
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ActB. b, Proposed quinone pocket based on the arrangement of conserved 
polar residues. c, Different views of the proposed quinone pocket with 

a docked menaquinone-1 molecule. Hydrophobic residues near the 
menaquinone-1 (MK1) head group are also shown. The crevice between 
a-helix 3 and a-helix 4 is a putative quinone entry pathway. 
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Extended Data Fig. 7 | Fitting of the ACIII structure to cryo-EM density. 
a, Fitting of cofactors into the cryo-EM density. The blue mesh is drawn 
with a higher density threshold to reveal metal centres. The numberings of 
nearby amino acid residues, which are shown along with these cofactors, 
are listed below each cofactor. b, Fitting of different secondary structure 
elements to cryo-EM density. c, Eleven identified lipids are modelled as 
phosphatidylethanolamine molecules. d, The triacylated cysteine at the 
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N terminus of ActE shown along with 15 downstream amino acids. 
Notably, residue Tyr28 is in contact with the covalent lipid of ActE. 
Attachment of ActE to the membrane may also be assisted by aromatic 
residues Tyr30 and Phe31, which appear to be inserted into the lipid 
bilayer. Throughout the molecular dynamics simulation trajectory, these 
residues remain buried in the lipid bilayer. 
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Extended Data Fig. 8 | Protein stability and lipid-protein interaction 
analysis based on molecular dynamics simulations. a, Root-mean- 
square deviation (r.m.s.d.) of the protein backbone heavy atoms for the 
entire ACIII complex and each subunit, aligned based on ACIII backbone 
heavy atoms from three independent molecular dynamics simulations. 

b, Same as a, but aligned using the backbone heavy atoms of each subunit. 
c, Superposition of the initial (black) and final (coloured) conformations 
of each subunit after 250 ns of simulation (aligned using backbone heavy 
atoms). d, The lipid-protein contact number defined by the number of 
lipid atoms within 4 A of the protein atoms calculated over the time course 
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of the simulation. This contact number is either calculated for the eleven 
lipids resolved by cryo-EM (top) or all membrane lipids (bottom). e, The 
lipid-protein contact number for each of the eleven cryo-EM resolved 
lipids. f, Isosurfaces (50%) of the atom-occupancy map for the lipid 
anchors (orange), cryo-EM resolved lipids (red) and other membrane 
lipids (purple), calculated using the last 230 ns of the simulation trajectory. 
The stronger the lipid-protein interactions, the longer the local residence 
time, which leads to higher atom-occupancy values. ACIII subunits 

C, D, and F are shown in silver. For all plots, the raw data are shown as 
translucent thin lines and the block-averages are shown as dark lines. 
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Extended Data Fig. 9 | Structural basis for supercomplex formation based on the transmembrane portion of subunit II. «-helices 1 and 2 of 
between the ACIII and the cyt aa3. a, Two contact areas between the subunit III are omitted to avoid steric clashes with the ACIII structure. 
ACIII and the cyt aa3: the transmembrane portion of subunit III (red) and c, Sequence alignment of subunit III with a long loop (highlighted with 
the loop from subunit II] (orange). A homology model of subunit III fits the orange bar) between a-helix 5 and a-helix 6 (numbered according to 
the transmembrane density. The loop is modelled to the cryo-EM density. subunit III from Rba. sphaeroides). Trp188 (black arrow) is conserved. 
The sequence of the peptide is also shown. Trp188 and Phe189 are used d, Sequence alignment of ActB from organisms with a long loop in subunit 
to register the density to the sequence. b, Model of the ACII]-cyt aa3 III of their cyt aa3 oxidase. Arg868 (red arrow) is largely conserved with 


supercomplex. The cyt aa3 structure from Rba. sphaeroides was positioned _ occasional substitution to lysine. 
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Extended Data Table 1 | Cryo-EM data collection, refinement and validation statistics 


Data collection and processing 
Magnification 
Voltage (kV) 
Electron exposure (e/A?) 
Defocus range (um) 
Pixel size (A) 
Symmetry imposed 
Initial particle images (no.) 
Final particle images (no.) 
Map resolution (A) 

FSC threshold 
Map resolution range (A) 


Refinement 
Initial model used (PDB code) 
Model resolution (A) 
FSC threshold 
Model resolution range (A) 
Map sharpening B factor (A?) 
Model composition 
Non-hydrogen atoms 
Protein residues 
Ligands 
B factors (A?) 
Protein 
Ligand 
R.M.S. deviations 
Bond lengths (A) 
Bond angles (°) 
Validation 
MolProbity score 
Clashscore 
Poor rotamers (%) 
Ramachandran plot 
Favored (%) 
Allowed (%) 
Disallowed (%) 


*Determined with cryoSPARC 
tDetermined with Phenix.mtriage 


Combined 
(EMDB-7286) 
(PDB 6BTM) 


+Mean value of the B factors determined with Phenix.real_space_refine 


ACIII-cyt aaz 
(EMDB-7447) 


LETTER 


ACIII 
(EMDB-7448) 
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ILLUSTRATION BY THE PROJECT TWINS. 


SCIENCE 


GOES VIRTUAL 


Virtual- and augmented-reality tools allow researchers to view and share data 
as never before. But so far, they remain largely the tools of early adopters. 


BY DAVID MATTHEWS 


s I put on a virtual-reality (VR) head- 
A= the outside world disappears. A cell 
fills my visual field, and as I crane my 
neck, I can see it from several angles. I stick 
my head inside to explore its internal structure. 
Using hand controllers, I dissect the cell layer 
by layer, excavating with a flick of the wrist to 
uncover tiny, specialized structures buried 
beneath the surface. 
Looking at a cell in VR is “as close as you can 


get to touching” such a minuscule structure, 
says Sebastian Konrad, product manager for VR 
at Arivis, a life-sciences software company in 
Munich, Germany, that developed this particular 
VR visualization tool, called InViewR, and who 
helped to arrange my demonstration of it. 

VR isn't new, but interest in the technology 
has boomed since 2016, when gamers and 
a handful of scientists introduced several high- 
quality, relatively inexpensive commercial 
headsets to the public. A similar surge has 
emerged in augmented reality (AR), a related 


technology that uses a see-through visor or 
smartphone screen to layer objects on top of 
real surroundings. 

Some scientists see VR and AR as more 
intuitive to use than conventional flat screens 
for viewing complex 3D structures. Others 
have sought cheap, smartphone-based head- 
sets, which use a smartphone screen as the 
goggles, to increase public understanding of 
their work. Their numbers are relatively small: 
VR and AR remain niche tools for scientific 
research. Yet some researchers say that the 
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» technology has provided new insights. 

Adam Lacy-Hulbert is a principal invest- 
igator at the Benaroya Research Institute 
in Seattle, Washington. He is particularly 
interested in lysosomes — structures that 
help to clean up the insides of cells. But he 
was perplexed by some of the 2D images he 
was getting using conventional microscopy. 
“It looked as if part of the lysosomes of the 
cell had moved into the nucleus, which didn’t 
really make sense to us.” 

But ConfocalVR, a tool developed at 
Benaroya that uses VR to visualize images 
from confocal microscopes, made what was 
really happening “jump out within seconds’, 
Lacy-Hulbert says. The nucleus was actually 
deforming and moving around the lysosomes. 

Wilian Cortopassi, a postdoctoral 
researcher at the University of California, 
San Francisco, has also gained scientific 
insights from VR explorations. ChimeraX is 
a molecular-visualization tool for proteins 
and other structures, which added support for 
VR headsets in November 2016. ‘Walking’ in 
virtual space through a network of hydrogen 
bonds helped Cortopassi to understand how 
certain mutants of a protein could stymie 
drugs that target it. A computer monitor is 
“so messy when you turn on a lot of atoms 
for visualization’, Cortopassi says. But in VR, 
“you can just walk through the hydrogens at 
different angles and distances, and quickly 
detect some important interactions”. 


GOGGLE-EYED 

Although inexpensive options are available, 
most visualization tools work only with the 
priciest headsets — such as Facebook’s Oculus 
Rift, and the Vive from Taiwanese electronics 
company HTC — because they can track the 
user’s head and handheld-controller move- 
ments in 3D space. Researchers and gamers 
have their preferences, but the differences 
between Oculus Rift and Vive are small. 
“I don't think there’s a clear winner at this 
point,” says Tom Ferrin, one of the develop- 
ers of ChimeraX, whose lab at the University 
of California, San Francisco, specializes in 
molecular-visualization tools. 

That said, not every tool is compatible with 
all headsets. InViewR works only with Oculus 
Rift, whereas ChimeraX and ConfocalVR work 
with both. Oculus Rift and Vive both run using 
the Windows operating system, although Vive 
is also compatible with MacOS X. 

VR is computationally intensive, both 
because each eye must see a different image to 
producea 3D effect, and because those images 
must refresh rapidly. In some cases, a new 
graphics card will add sufficient computing 
power, “but in general you're probably going 
to buy a new computer’, says Tom Skillman, 
director of informatics and research tech- 
nology at Benaroya and one of the creators 
of ConfocalVR. Oculus Rift suggests using 
VR-compatible computers ranging from 
US$850 to nearly $3,100; it recommends at 
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least 8 gigabytes of memory and a high-end 
graphics card. 

The VR software itself can also be expen- 
sive. Although ConfocalVR and ChimeraX 
are free for non-profit entities, that is not true 
for commercial firms. ConfocalVR declined 
to share pricing information, but ChimeraX 
can cost up to $20,000, depending on the 
number of users. 

For researchers who like to work as a team, 
the developers of ConfocalVR added in April 
the option for up to four users to simultaneously 
view, point to and grab structures in the same 
VR space. This could mean that scientists do 
not have to meet face to face to work together, 
says Skillman, which would potentially reduce 
travel costs. The developers of both ChimeraX 
and InViewR are looking to add similar 
collaborative features in the future. 


AUGMENTING REALITY 

Compared with VR, visualization software for 
AR headsets is less advanced. Mark Hoffman, 
chief research information officer at Children’s 
Mercy Kansas City, a hospital in Missouri, 
has experimented with viewing proteins and 
computed tomography (CT) scans using 
Microsoft’s HoloLens — a kind of visor with 
a built-in computer that projects 3D objects 
over the real world. 

He says that AR is more user-friendly than 
VR because users can see their surround- 
ings and so are less prone to disorientation. 
Hoffman actually experiences motion sick- 
ness in VR — and 


this is not an uncom- “Surgeons at the 
mon complaint. “In Children 8 Mercy 
all my work with the Hospital are 
HoloLens,I’venever ©Xperimenting 
been uncomfortable”? withaugmented 
he says. reality to 

The downside is view scans 
that, whereasa VR of patients’ 
headset envelops hearts” 


your entire field of 

view, the HoloLens projects objects only onto 
a relatively narrow rectangle in the centre of 
your vision. “It’s part of the trade-off? Hoffman 
says. AR is not completely immersive, but it is 
“an enabler to comprehension’, he says. “There 
may be things you can miss on a flat screen” 
that become clearer in AR — protein-protein 
interactions, for instance. 

Surgeons at Children’s Mercy are exploring 
the use of AR to view CT scans of patients’ 
hearts before an operation, he says. Hoffman 
uses a step-by-step approach to make such 
data viewable using the HoloLens. The 
surgeon can explore the tissue by project- 
ing it onto a fixed point in space — say, 
in the middle of the room. But if they turn 
their head, the image disappears and they 
see only what is actually there. “They walk 
into the ventricle or the atrium of the heart, 
and maybe they’ll see that, for a particular 
child, the entry point of a blood vessel is not 
where it normally would be.” The HoloLens 


costs $3,000, and must be ordered from 
Microsoft directly, because it is not available 
in the shops. 


LOW-COST OPTIONS 

Cheaper headsets that use smartphones as 
the screen in a pair of goggles, such as the 
Samsung Gear VR or Google's $15, ultra- 
simple Cardboard, can help researchers to 
reach a broader audience. 

Juicebox VR, an app designed for these 
simple devices, visualizes the connectivity of 
the human genome as a Mars-like landscape 
scarred with a colossal wall, says Erez Aiden, 
a geneticist at Baylor College of Medicine in 
Houston, Texas, whose lab developed the 
tool. The features of the landscape represent 
the topography of condensed DNA in animal 
cells, and the ridge represents intersections 
between different parts of the genome. “When 
people interact with this, they really get a sense 
of what the data look like,” he says. 

Biologists have also adopted Augment, an 
app normally used to illustrate how furniture 
might look in a room, to allow colleagues, 
students and members of the public to 
inspect 3D models of proteins through their 
smartphone screens. 

For researchers interested in creating their 
own visualization tools, Unity — software 
designed by Unity Technologies in 
San Francisco for building games — is one 
of the most commonly used development 
environments. It runs on relatively modest 
hardware, says Muhammad Saad Shamim, 
who used it to help to develop Juicebox VR 
ona Mac Pro. For the HoloLens, users needn't 
be advanced developers to import 3D objects, 
Hoffman says. But they should be comfort- 
able with Unity, as well as Microsoft’s Visual 
Studio programming environment. Other 
options include Unreal Engine, from Epic 
Games in Cary, North Carolina, which is free 
for academic users, and OpenGL, a no-cost 
3D-graphics tool used in game development, 
computer-aided design and flight simulators. 
Ferrin, who used OpenGL to create ChimeraX, 
says OpenGL requires more initial work than 
Unity or Unreal because developers need to 
handle more programming details directly, but 
the pay-off is fewer constraints on functionality. 

Despite the broad proliferation of VR and 
AR tools in consumer culture, only a small 
minority of labs currently uses the technology, 
and it remains to be seen how many others will 
follow suit. Yet many advocates predict that VR 
and AR could become standard lab tools over 
the next five years or so. The technology feeds 
information to our brains in three dimensions, 
the way “a million years of evolution” intended, 
says Skillman. It requires an enormous amount 
of intellectual work to construct a 3D mental 
model from a 2D screen, he says. “All that work 
goes away when you put on the goggles.” = 


David Matthews is a freelance writer based in 
Berlin, Germany. 
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Why mental 
health matters 


Nature talks to five researchers about the stresses of a 
hyper-competitive environment, and what needs to change. 


BY CHRIS WOOLSTON 


ore than 150 scientists contacted 
Mikes with their personal stories 

following coverage of an international 
survey showing evidence of a mental-health 
crisis in graduate education (T. M. Evans et al. 
Nature Biotechnol. 36, 282-284; 2018). To kick 
off a series on mental health in academia, we 
talked to five people on the front lines of science 


who were willing to share their insights and dis- 
cuss how changes to the culture might help. 

Next week, we will profile four scientists 

who have experienced severe depression and 

its career consequences. 

And the week after that, 


Read more stories well examine health in 
on mental health in labs, and ask what kinds 
of lessons can be learnt 


science at 
| from other sectors. 


Network for health 


PhD student in engineering at the 
University of Kansas in Lawrence 


I was hospitalized for depression in 2017 — 
and there I learnt the importance of having 
a support network. It makes your struggles 
a little bit easier if you have a community. 
I’ve reached out to people on campus, but 
I also found a community thanks to the 
Cheeky Scientist Association (CSA), a group 
based in Liberty Lake, Washington, that was 
created by careers consultant Isaiah Hankel 
to provide advice and support to researchers 
worldwide. The CSA posts a lot of success 
stories, and reminds its members of the 
value ofa PhD. It’s been a big help and a huge 
source of comfort. 

I see a therapist weekly. When I walkin, ’'m 
always in a great mood. My therapist validates 
my emotions and reminds me that I’m mostly 
struggling against a flawed system, not with a 
personality flaw. Their continuous encourage- 
ment has helped me to focus on finishing my 
dissertation and keeping my head in the game. 

Graduate students are suffering, and they 
need help. We have fantastic mental-health 
services on this campus, but a lot of students 
are hesitant to use them. Some are worried 
about costs, but they might be surprised. 
My weekly sessions on campus, for example, 
are 100% covered by student insurance. Like- 
wise, some students might not want to raise 
concerns about their adviser or their depart- 
ment out of fear of retaliation. ’'ve been warned 
not to bite the hand that feeds me. 

As president of the campus Graduate 
Engineering Association, I’m trying to create 
a sense of community and encourage people to 
get out of the lab. We threw the first graduate 
engineering formal in April 2017 at an upscale 
hotel on campus. There was a DJ, a professional 
photographer and even a red carpet. We got 
a huge response. We also have professional- 
development events. People from industry 
come here and help us to go over our CVs. 
Were going through this together. 

Like a lot of other students, I struggle with 
work-life balance. ’'ve faced some criticism for 
devoting time to my leadership role with the 
Graduate Engineering Association. But, thanks 
to my support networks, I have the confidence 
to be more assertive about my choices. } 
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VINCE BUTITTA 
Praise for papers 


PhD student in limnology at the 
University of Wisconsin- Madison 


I know where my anxiety comes from. Last 
year, I hada paper come out (V. L. Butitta et al. 
Ecosphere 8, e01941; 2017). It was well received 
and got a lot of attention on Twitter. It was the 
first time I felt like I was actually doing science, 
not just playing a part. But then, everything died 
down. Sometimes I go online to get a figure 
from my paper, and see that there aren't any new 
citations. I feel like ’'m shouting into the void. 

I still struggle with that particular brand of 
anxiety, but I’m doing what I can to help other 
students who might feel the same way. When 
I see a paper that I find interesting, I make sure 
to send the author an e-mail or message them 
on Twitter. I say: “I just read your paper — it 
helped me with some concepts. I look forward 
to seeing your future work’ It lets people know 
that they have worth. That sort of support 
doesn't have to come from superiors. 

Those messages might help others, but 
they’re also great for me. I connect with other 
researchers, and when I'm at a conference, 
someone might recognize my name tag 
because we've interacted on Twitter. 

My paper still doesn't have a lot of citations, 
but I was invited by a session organizer to 
speak at an annual meeting of the International 
Association for Landscape Ecology last month 
in Chicago, Illinois, because she saw my paper 
on Twitter. Knowing that someone thought 
the paper was good enough for a conference 
gave mea greater sense of satisfaction than did 
publishing it in the first place. 


MATTIAS BJORNMALM 
Change the culture 


Research fellow in materials science, 
Imperial College London 


I'm passionate about protecting and support- 
ing the mental health of early-career research- 
ers. I received my PhD just two years ago, and 
many people in my life are graduate students 
or are working with students. I have a per- 
sonal, emotional connection to their strug- 
gles. There's an enormous waste of talent and 
resources that we're not addressing. 

The research culture lies at the core of many 
scientists’ mental-health issues. The environ- 
ment is hyper-competitive, and the path for 
success is almost impossibly narrow. That's a 
scenario that breeds anxiety and depression. 
People want to produce as much research and 
as many papersas possible. Anything that takes 
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away from that can make life more difficult. It 
is a situation where everyone is pursuing a goal 
that’s almost impossible to reach — the next 
grant, the next fellowship, the next position. 
I'm part of the policy working group for 
an international professional network called 
the Marie Curie Alumni Association. Id like 
the working group to have a new mission: 
aligning the incentives and rewards of science 
with the type of work and productivity that we 
really want to see. We need to better reward 
non-traditional outcomes, such as data sets, 
research methods and code. And we need to 
better appreciate activities outside of the lab, 


such as public engagement, education and 
outreach. That’s the way towards achieving 
substantial and lasting change. 

We also need to encourage students to pursue 
career prospects both in and out of academia. 
It's amazing to me how prevalent the belief is 
that the right path forward is the tenure-track 
position. People talk about alternative career 
paths, but too often with the connotation that 
it’s for people who didn’t make the cut. 

As a scientist, I’m also interested in the 
evidence. We need to do more to map and 
monitor the situation. The few studies that 
have addressed mental-health issues among 
graduate students had alarming results, but the 
message isn't getting out. There are still schools 
that believe they don’t have a problem. But any- 
one who works with graduate students on a 
day-to-day basis knows that mental-health 
issues are very prevalent. 

For me, and for a lot of people I work with, 
the whole point of science is to make the world 
better in some sense. I’m trying to develop new 
materials as a scientist, but I’m also trying to 
understand our research culture and how we 
can improve it. I think it starts with leadership 
style. If you can create a local environment 
in your research group or your department 
that supports talking about these issues and 


working on ways to improve them, you can 
have a big and immediate impact. 

We want people to do good research, and we 
need them to be healthy. 


FRANZISKA FRANK 
Real-world results 


PhD student in ecology and 
environmental sciences, 
Umed University, Sweden 


Sometimes I question my worth to society, and 
this doubt has added to my feelings of depres- 
sion. Everyone is publishing and publishing 
because that’s where the money in science 
comes from. But if everyone is publishing and 
nobody is reading, are we making a contribu- 
tion? Are we really doing anything important? 

There's an excellent service at my university 
that offers therapy and counselling. You don’t 
have to wait long for an appointment, and they 
are very familiar with the worries of academics. 
I’ve met doctors in the past who didn't seem 
very interested in the stress that I was feeling. 
But now, there seems to be more awareness. 
If we want to talk about depression and mental 
health, we must acknowledge the progress that 
has already been made. 

Before I started my PhD programme, I did 
some science communication and education, 
and that really gave me a sense of satisfaction 
and validation. We'd take children and their 
parents to a mobile lab to learn about the North 
Sea. That’s something everyone can relate to. 
They get really interested in science, and it's not 
the type of science that comes from a journal. 

What do you get from a journal? You submit 
an article, then you get rejected multiple times. 
Eventually it’s accepted, and you move on to the 
next thing. Unless you get published in a very 
prestigious journal or get a lot of citations, it 
can feel like a downer, even though you accom- 
plished what you were supposed to accomplish. 

I would encourage other students to think 
about what they really want from a PhD. Sort 
things out for yourself. Talk to people who are 
important in your personal and professional 
life, and don't forget to work out. And try to 
have a life outside the lab. 


RACHEL PIPER 
Train universities 


Policy manager, Student Minds, 
Oxford, UK 


Our charity works with about 120 universities 
across the United Kingdom. We equip students 
to cope with graduate school whether or not 
they've been diagnosed with a mental-health 
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issue. We have training programmes with 
university staff members. We want staff 
members to be able to listen, but they 
shouldn't be the only source of support for 
every student. 

Our main goal is to make sure that every 
university has a strategic response to mental 
health. We support the recommendations 
of the #stepchange framework, which was 
launched by Universities UK in 2017 to help 
improve the mental health of students and 
faculty members in higher education. Uni- 
versities must look at their needs and have a 
specific plan of action to make sure everyone 
has access to support and treatment. 

When the mental-health charity Student 
Minds started in 2009, many universi- 
ties denied that they had a mental-health 
problem on their campus. But the conversa- 
tion has changed. Now, universities say, “We 
know we need to do something, buts what 
the right thing to do?. 

Students also have to look after each 
other. It’s common for people to tell their 
peers about their 


troubles but no “Universities 
one else. A 2014 mustlook at 

UK study by the their needs and 
Equality Challenge have aplan 

Unit found that of action to 

75% of students ensure everyone 
with mental- hasaccessto 
health challenges support.” 
disclosed the issue 


to peers (see go.nature.com/2qvhd8k). But 
according to the Health Education Statistics 
Agency, only about 3% of all students in the 
2016-17 academic year formally reported 
a mental-health issue to their university. 
As the discussion continues, hopefully more 
students will feel comfortable reaching out to 
supervisors and administrators. 

People don’t recognize that students have 
a different experience from other young 
people. When it comes to National Health 
Service funding, Student Minds is one of 
a few groups trying to get more student- 
health-care models. There’s a misconcep- 
tion that students are privileged and don't 
need extra support. I had my own mental- 
health concerns as a student, and while 
I'm much better now, I know how mental 
health can affect everything. Once you see 
it, you can't unsee it. 

University should be a place where 
someone can thrive regardless of anxiety or 
depression. If you have the right support, 
you can have a diagnosis and still do well. 
If that support isn’t there, you can have no 
diagnosis and still be stressed. Staff should 
see university as an opportunity to support 
people and set them up for their future. 
If you can help them at university, you're 
setting them up for a win. m 
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Political expatriate 


Theoretical chemist Alan Aspuru-Guzik was 
among many US citizens who talked of moving 
to Canada after the November 2016 election of 
Donald Trump as US president. Now, Aspuru- 
Guzik has made good on his declaration, and 
will begin a new post in July. He explains how 
the US political climate prompted him to leave 
his tenured post at Harvard University in 
Cambridge, Massachusetts, after nearly 

20 years in the country. 


Why are you leaving the United States? 

The nation is at a crossroads. Is it going to 
continue as a civil society in which politicians 
and people from different sides respect each 
other? Or is it going to become a country that 
has lost political decency and dialogue? Why 
not use my skills in a country where I don't 
have to worry about the next national drama, 
and can concentrate on my science and be with 
people who share my values? 


What will you be doing? 

I’ve accepted a post as a Canada 150 Research 
Chair in theoretical and quantum chemistry at 
the University of Toronto, worth Can$1 million 
(US$780,000) a year for 7 years. I'll also be a fac- 
ulty member at the Vector Institute, which is the 
new artificial-intelligence research institute in 
Toronto. 


What is the Canada 150 programme? 

The Canadian government announced last 
year that it would invest Can$117.6 million to 
enhance the country’s “reputation as a global 
centre for science, research and innovation 
excellence, in celebration of Canada’s 150th 
anniversary”. Canadian institutions get a one- 
offlump sum to attract top-tier researchers. 


What disturbs you most about the US political 
environment? 

We dont have a very civilized way of passing 
budgets, so even though spending for science 
was increased, it’s tied up with military 
increases. We have to try to solve climate- 
change problems. But the United States just left 
the Paris agreement. I am a dual US—-Mexican 
citizen. I have been here for 20 years, and it 
doesn't look like it’s getting better. Even when 
the Democrats were in power, the same politi- 
cal war was being waged between the parties. 
This is the way democracies end — not by 
coups anymore. 


What was it like to work in the United States as 
a dual national? 

I’ve been lucky to be in some of the most 
inclusive places in the United States. I lived 


roughly half of my time in California and half 
of it in Massachusetts. I have a PhD; I helped 
to launch start-up companies; I’m a professor 
at Harvard and I’ve published a lot. ’m one of 
the very privileged in the United States. But 
how about others who are not? Why should I 
not worry about them? 


Are there drawbacks to vacating your position 
and leaving collaborators? 

I'm leaving a favourable ecosystem. But there 
are many other great places. Toronto is one. 
It’s one of the most diverse cities in the world, 
and Canada is leading the world in artificial 
intelligence and quantum computing. I plan 
to continue my collaborations at Harvard and 
the Massachusetts Institute of Technology in 
Cambridge with key collaborators, and I'll con- 
tinue to expand them in Canada. 


What are your thoughts about leaving the 
United States and Harvard in general? 

All moves are bittersweet. I’m not leaving 
in any way or form because of Harvard. ’'m 
thankful for them as a platform for my career 
— they were extremely supportive. Some 
people believe that one should spend forever 
ina single place. I think that shouldn't be the 
case. Sometimes we should do this more often. 
So I also think it’s great that somebody else will 
take my position at Harvard and that there will 
be new activity. 


What do you see as the main cultural 
differences between the United States and 
Canada? 

In Canada, people on the street emphasize 
how welcome you are. And even though you 
have disagreements, you can still respect your 
opponents. m 


INTERVIEW BY BRIAN OWENS 


This interview was edited for clarity and length. 
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ERTO FOR NEW HANDS 


BY ANDREA KRIZ 


y fingers shake with the 
effort of pincering the CD. 
One wrong move, I know, 


and I'll snap it in two. Slowly, jerkily, 
I lower it into the player, press it into 
place and close the lid. No sooner do 
I do so than my arm thumps back 
to the covers. Sweat pours down my 
face. Willing that hunk of metal to 
move again seems more daunting 
than climbing Olympus Mons. 

“Even in that condition, you're 
signing on for another tour?” 

“There's still fighting on Titan, 
I'm told, I pant. “Don't worry. After 
some training, these will feel just 
like my own.” 

“I'm not talking about your arms, 
Cygnus.” 

“Tm not like you, Halla.” I force a 
smile. “Even if my records have been 
wiped. I'll never be a real human. 
Piloting’s the only life I have — I’m trapped” 

“Youre not trapped.” 

A nurse opens the door. She drops a bunch 
of white lilies into a vase, taking in the scene 
out of the corner of her eye. The charred CD 
player I’m holding in one hand. My other 
hand groping for the headphones, finally 
grasping — only for them to slip through my 
fingers and clatter to the floor. In one fluid 
motion, she scoops them up and nestles the 
buds into my ears. 

“You're making progress,’ she says sooth- 
ingly. 

I close my eyes. 

“Who were you talking to?” 

Whenever a member of our mech 
squadron was killed, Halla would do this, 
mouthing her goodbyes. I didn’t even know 
what this beat-up, orange piece of oldtech 
was back then. I waited until she’d fallen 
asleep and dug it out of her locker, pulling on 
the headphones upside down. Maybe I was 
expecting some of those crazy Earth drugs 
Id heard about. I don't know. 

“Oh, you want to listen, too?” 

I tried to hear it. But I could never 
understand those tears that rolled down her 
face. I read and reread the back of the case. 
Ravel. Piano Concerto for the Left Hand. What 
did we have to do with a pianist who lost his 
arm centuries ago? Synthetics like me were 
biologically incapable of emotion, after all. My 
previous commander had often told me. And, 
true to his word, I felt nothing when he died. 
Even when the engineers determined that we 
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crashed because of my shortcomings, when 
they threw me in a hangar to await incinera- 
tion. I didn't believe in afterlife. But if I did 
— itd be that squadron who landed, butterfly- 
like, on that dead rock of a colony. The com- 
mander who stepped down, haloed in light, 
said her co-pilot had been gravely wounded, 
that shed take anybody, anything, instead. 

“Tm Halla.” 

“Tm a failure.” 

Her thumb ran over the serial number 
under my eye. Crossed out. Condemned. 

“I don’t care about any of that. But it 
must ve been so, so hard for you.” 

I don’t understand. I can’t understand. 
Good synths fight for Earth, get human 
status. Bad synths fight for separatists, get 
torched. Why do I keep hearing her voice? 
I'm only going back where I belong. I reach 
for the table and knock over the vase. The 
nurse hurries over and rights the flowers. She 
settles the tablet into my lap. 

“That boy in the synthetic ward you asked 
me about,” she whispers before replacing 
my earbud. “The one wounded in your last 
bombing run. He passed away.” 

The door shuts. Her steps echo down the 
hall. Should I call her back? I wonder. In my 
chest, an ache I’ve never felt before. No. No 
more thinking. With a swipe of my eyes, I 

open the re-enlist- 
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much to live for. So many people... 
waiting for you...” 

“That right. I've had such a beau- 
tiful life, Cygnus. But you. I want 
you...” 

I spasm, my knees digging into 
my forehead. My fists, my entire 
body curls upon itself. 

“How can I have a beautiful 
life?” I whimper. “When I ended so 
many...” 

The piano thunders. I see Halla 
as I did last. Her head in my lap, 
blood trickling from her mouth. 
My palms seared against the twisted 
wreckage above us. Even as every 
movement skewered shrapnel into 
my arms, even as they numbed, I 
kept clawing upwards. Ignoring 
the thawing ice pouring in around 
us. Ignoring the fact that even if I 
forced my way out of the cockpit, Id 
meet enemy fire, I couldn't possibly 
drag her with me. 

“You've gotta promise me, Cyg. I'm leaving 
you all my CDs.” 

“How can [listen to music?” I sob. “I killed 
them. Synths just like me” 

“Cygnus. You're always looking up at those 
stars, aren't you? I can't do this numbers 
business. That's what I’m calling you.” 

“They must’ve wanted names... they 
must’ve had people they loved too...” 

“The war's over, Cygnus.” 

I feel her through the drumbeat. 

Youre not trapped. Not trapped. 

Slowly, painfully, breathing like the 
physical therapist instructed me to, I man- 
age to unclench my fists. I can almost see 
my reflection in their smooth, silver palms. 
Theyd paint them to hide the joints, a rep 
from the bionics company told me, with 
plastic just like skin, sculpt every scar and 
callus into its former place. But maybe I don't 
want it that way. 

These are new hands. 

Hands that haven't been trained to kill. 

These could be a rocketeer’s hands. 
A cargo-ship driver's. A painter's hands... 

“Halla, I whisper. 

Iset the tablet down on my bedside table, 
my grip firm. The last notes of the concerto 
fade into a victory march, a final chord. 

“T think I'll learn to play piano instead.” = 


Andrea Kriz flies with the vultures in 
Cambridge, Massachusetts. Her stories 
have also appeared in recompose and 
Daily Science Fiction. 
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